SymDOCK: Docking with Symmetry and Cooperativity

This guide is for using the symmetric extension to DOCK3.8 (SymDOCK) for Docking ligands against protein fibrils in a symmetric way and incorporating a crude ligand-ligand interaction energy. The same approach works for protein systems with other symmetry groups (e.g. cyclic groups \(Z_n\) for integer \(n\)), but I’ll keep the tutorial to the infinite fibril case (symmetry group of the integers \(\mathbb{Z}\) under addition).

Written by Matthew S. Smith. Please see the publication Matthew S. Smith, Ian S. Knight, Rian C. Kormos, Joseph G. Pepe, Peter Kunach, Marc I. Diamond, Sarah H. Shahmoradian, John J. Irwin, William F. DeGrado, and Brian K. Shoichet. “Docking for Molecules That Bind in a Symmetric Stack with SymDOCK.” Journal of Chemical Information and Modeling (2024)64(2):425-434. https://doi.org/10.1021/acs.jcim.3c01749.

1. What You Will Need

A cleaned up .pdb file for the protein (which I have called protein.pdb), a .pdb file for a single ligand in the region to dock (which I have called xtal-lig.pdb), a .pdb file for the ligand stack (which I have called lig_stack.pdb), an INDOCK file, a version of DOCK3.8 that supports symmetry, and a programming language you are comfortable using (I am using Python).

2. Background Math

You can safely ignore this section if you are not interested in the linear algebra.

We can represent the symmetry transformations of an atomic system (proteins, small molecules, etc.) by an affine transformation, which is composed of a linear transformation followed by a translation. For vectors in 3D (\(\mathbb{R}^3\)), we can still represent an affine transformation by a matrix by appending a “1” to the vector and using a \(4\times4\) matrix. As an example, consider the vector \((x,y,z)\). If we want to apply a linear transformation given by the \(3\times3\) matrix \(R\) to this vector and then follow with a translation by the vector \((a,b,c)\), we can do the multiplication

\[\begin{split}\begin{pmatrix} 1 & 0 & 0 & 0\\ a & & & \\ b & & R & \\ c & & & \end{pmatrix} \begin{pmatrix} 1\\ x\\ y\\ z \end{pmatrix} = \begin{pmatrix} 1\\ \begin{pmatrix} a\\ b\\ c \end{pmatrix} + R \begin{pmatrix} x\\ y\\ z \end{pmatrix} \end{pmatrix}.\end{split}\]

In our case, the matrix \(R\) is a rotation in 3 dimensions.

Note: The convention in algebra is to append the 1 at the end of the vector, putting the line of \((1,0,0,0)\) at the bottom of the matrix, not the top. I started this project using the convention in statistics of appending a 1 to the start of a data vector to allow for easy displacement by an intercept/bias in a linear model, and I have kept that convention here. The actual docking executable uses the rotation matrix and translation vector as separate inputs, so feel free to adjust the code in this tutorial to fit the algebra convention.

To find the affine transformation from one set of coordinates to another, select 4 vectors \((a_x,a_y,a_z)\), \((b_x,b_y,b_z)\), \((c_x,c_y,c_z)\), and \((d_x,d_y,d_z)\) in the first coordinate frame and their images \((a^\prime_x,a^\prime_y,a^\prime_z)\), \((b^\prime_x,b^\prime_y,b^\prime_z)\), \((c^\prime_x,c^\prime_y,c^\prime_z)\), and \((d^\prime_x,d^\prime_y,d^\prime_z)\) in the second frame, respectively. Then construct the matrices

\[\begin{split}A = \begin{pmatrix} 1 & 1 & 1 & 1\\ a_x & b_x & c_x & d_x\\ a_y & b_y & c_y & d_y\\ a_z & b_z & c_z & d_z \end{pmatrix}\end{split}\]

and

\[\begin{split}B = \begin{pmatrix} 1 & 1 & 1 & 1\\ a^\prime_x & b^\prime_x & c^\prime_x & d^\prime_x\\ a^\prime_y & b^\prime_y & c^\prime_y & d^\prime_y\\ a^\prime_z & b^\prime_z & c^\prime_z & d^\prime_z \end{pmatrix}.\end{split}\]

If \(T\) is the affine transformation matrix going from the first set of coordinates to the second, then by simple matrix multiplication,

\[TA=B.\]

The matrix \(A\) has the same determinant as the matrix

\[\begin{split}\begin{pmatrix} a_x - d_x & b_x - d_x & c_x - d_x\\ a_y - d_y & b_y - d_y & c_y - d_y\\ a_z - d_z & b_z - d_z & c_z - d_z \end{pmatrix}\end{split}\]

(and all cyclic permutations of \(a\), \(b\), \(c\), \(d\)), and the determinant of this matrix is zero if and only if the 4 vectors are coplanar. Put another way, if we have chosen the points \((a_x,a_y,a_z)\), \((b_x,b_y,b_z)\), \((c_x,c_y,c_z)\), and \((d_x,d_y,d_z)\) to be non-coplanar, then \(A\) will be invertible, and we can find \(T\) as

\[T=BA^{-1}\]

3. Getting the Affine Transformation Matrix

For most protein systems, the PDB entry and publication will not include the affine transformation matrix in a neat, nice form. If the structure is big enough, we can estimate the transformation by following the symmetry of a few example points.

Here is some example code I wrote for an early version of the structure of AD PHF tau bound to GTP-1 (PDB ID 8FUG; Merz GE, Chalkley MH, Tan SK, Tse E, Lee J, Prusiner SB, Paras NA, DeGrado WF, and Southworth DR. “Stacked binding of a PET ligand to Alzheimer’s tau paired helical filaments.” Nature Communications (2023)14:3048. https://doi.org/10.1038/s41467-023-38537-y). I used many more example points than these, but I’ll keep this short (ish):

import os #For reading files in directory
import numpy as np #For NumPy functions
import statistics #For mean and stdev
np.set_printoptions(suppress=True) #To not print out in scientific notation

dir = {PATH} #Equivalent on your computer
file_name = "protein.pdb"
f = os.path.join(dir, file_name)



# A helper fumction to extract coordinates from a line
def extract_coord(x):
    # x is a line of a PDB file
    return(np.array([float(x[30:38]), float(x[38:46]), \
        float(x[46:54])]))

# Make a function to extract the relevant coordinates and calculate the
# symmetry operation
def get_T(res_1, res_2, atom_1, atom_2, atom_3, atom_4):
    # Each residue parameter should be a string of the form "RES A ###", for
    # the residue type, chain name, and residue number (9 characters total)
    # Each atom parameter should be a string of 4 characters properly aligned
    # to the atom type in a PDB file in columns 13-16
    # This function returns a 4x4 NumPy array (matrix)
    file = open(f, "r")
    for line in file:
        if line[0:5] == 'ATOM ' and line[17:26] == res_1:
            if line[12:16] == atom_1:
                a = extract_coord(line)
            if line[12:16] == atom_2:
                b = extract_coord(line)
            if line[12:16] == atom_3:
                c = extract_coord(line)
            if line[12:16] == atom_4:
                d = extract_coord(line)
        if line[0:5] == 'ATOM ' and line[17:26] == res_2:
            if line[12:16] == atom_1:
                a_prime = extract_coord(line)
            if line[12:16] == atom_2:
                b_prime = extract_coord(line)
            if line[12:16] == atom_3:
                c_prime = extract_coord(line)
            if line[12:16] == atom_4:
                d_prime = extract_coord(line)
    file.close()

    # Use Python list comprehension and NumPy to make the matrices
    A_mat = np.vstack([ np.array([1.0 for i in range(1,5)]), \
        np.array([ [v[i] for v in [a, b, c, d] ] for i in [0,1,2]]) ])
    B_mat = np.vstack([ np.array([1.0 for i in range(1,5)]), \
        np.array([ [v[i] for v in [a_prime, b_prime, c_prime, d_prime] ] for i in [0,1,2]]) ])

    T_mat = np.dot(B_mat, np.linalg.inv(A_mat))

    # Need to update the first row of T_mat
    T_mat[0] = [1.0, 0., 0., 0.]

    # Need to round to 3 decimal places, since that's the significant figures
    # of the PDB file input
    T_mat = np.around(T_mat, 3)

    # Return the symmetry matrix T
    return(T_mat)



# Start with atoms in Q351 in chains G and I:
# N, O, CB, and CD
Q351_GI = get_T('GLN G 351', 'GLN I 351', ' N  ', ' O  ', ' CB ', ' CD ')
# Next try with the same atoms but chains I to A
Q351_IA = get_T('GLN I 351', 'GLN A 351', ' N  ', ' O  ', ' CB ', ' CD ')
# Next try with the same atoms but chains A to E
Q351_AE = get_T('GLN A 351', 'GLN E 351', ' N  ', ' O  ', ' CB ', ' CD ')
# Next try with the same atoms but chains E to C
Q351_EC = get_T('GLN E 351', 'GLN C 351', ' N  ', ' O  ', ' CB ', ' CD ')

# Now try with atoms on ligand from chains P to O:
# F2, C21, C23, C32
lig_PO = get_T('UNK P   1', 'UNK O   1', ' F2 ', ' C21', ' C23', ' C32')
# Next try with the same atoms but ligands O to R
lig_OR = get_T('UNK O   1', 'UNK R   1', ' F2 ', ' C21', ' C23', ' C32')
# Next try with the same atoms but ligands R to Q
lig_RQ = get_T('UNK R   1', 'UNK Q   1', ' F2 ', ' C21', ' C23', ' C32')

# Example of printing out a full affine transformation matrix:
print(Q351_GI)


# Seems like rotation matrix is pretty consistent (not shown), using
# Small-Angle Approximation. But more variation in translation vector. Use
# Python list comprehension
print("Q351 Atoms N, O, CB, and CD from Chains G to I")
Q351_GI_trans = [ Q351_GI[i,0] for i in [1,2,3] ]
print(Q351_GI_trans)
print()

print("Q351 Atoms N, O, CB, and CD from Chains I to A")
Q351_IA_trans = [ Q351_IA[i,0] for i in [1,2,3] ]
print(Q351_IA_trans)
print()

print("Q351 Atoms N, O, CB, and CD from Chains A to E")
Q351_AE_trans = [ Q351_AE[i,0] for i in [1,2,3] ]
print(Q351_AE_trans)
print()

print("Q351 Atoms N, O, CB, and CD from Chains E to C")
Q351_EC_trans = [ Q351_EC[i,0] for i in [1,2,3] ]
print(Q351_EC_trans)
print()

print("Ligand Atoms F2, C21, C23, and C32 from Chains P to O")
lig_PO_trans = [ lig_PO[i,0] for i in [1,2,3] ]
print(lig_PO_trans)
print()

print("Ligand Atoms F2, C21, C23, and C32 from Chains O to R")
lig_OR_trans = [ lig_OR[i,0] for i in [1,2,3] ]
print(lig_OR_trans)
print()

print("Ligand Atoms F2, C21, C23, and C32 from Chains R to Q")
lig_RQ_trans = [ lig_RQ[i,0] for i in [1,2,3] ]
print(lig_RQ_trans)
print()



# Mean
print("Mean")
print("x")
print(statistics.mean( [v[0] for v in [Q351_GI_trans, Q351_IA_trans, \
    Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
print("y")
print(statistics.mean( [v[1] for v in [Q351_GI_trans, Q351_IA_trans, \
    Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
print("z")
print(statistics.mean( [v[2] for v in [Q351_GI_trans, Q351_IA_trans, \
    Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
print()

# Sample standard deviation, with N-1 degrees of freedom
print("Standard Deviation")
print("x")
print(statistics.stdev( [v[0] for v in [Q351_GI_trans, Q351_IA_trans, \
    Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
print("y")
print(statistics.stdev( [v[1] for v in [Q351_GI_trans, Q351_IA_trans, \
    Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
print("z")
print(statistics.stdev( [v[2] for v in [Q351_GI_trans, Q351_IA_trans, \
    Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))

4. Extending the Protein Fibril and Ligand Stack

Now that we have the affine transformation matrix, we need to use it to extend the protein fibril and ligand stack. We need to extend the protein fibril so that the DOCK socre grids do not have edge effects. We will also need a stack of ligands that fills the full protein fibril to model the change in dielectric upon binding of a stack.

My example was a bit tricky, because the structure had 5 copies of the protein monomer but only 4 copies of the ligand. Your system might have weird exceptions like this, so I will include what I did.

Let’s start with just the protein fibril. The structure has 5 copies of the protein monomer, so to easily make a system larger than the DOCK grid box, we will just copy and plop the whole system on either side of the existing fibril section. The symmetry operation we calculated earlier is just for going from one monomer to the next. We need to apply it 5 times in either direction (as is and inverted) in order to move the whole fibril section. T_mat is the symmetry matrix I settled on after the analysis above.

import os #For reading files in directory
import numpy as np #For NumPy functions

# Symmetry operator as an affine transformation (in SE(R^3))
T_mat = np.array([ [1., 0., 0., 0.], [-2.238, 1., 0.019, 0.], \
    [2.266, -0.019, 1., 0], [4.747, 0., 0., 1.] ])
# Need to apply 5 times to get the movement of the entire 5-strand bundle
# down the fibril
T_full = np.linalg.matrix_power(T_mat, 5)

# Inverse symmetry operator
T_inv = np.array([ [1., 0., 0., 0.], [2.280, 1., -0.019, 0.], \
    [-2.223, 0.019, 1., 0], [-4.747, 0., 0., 1.] ])
T_inv_full = np.linalg.matrix_power(T_inv, 5)

dir = {PATH} #Equivalent on your computer
file_name = "protein.pdb"
f = os.path.join(dir, file_name)



# A helper fumction to extract coordinates from a line
def extract_coord(x):
    # x is a line of a PDB file
    return(np.array([float(x[30:38]), float(x[38:46]), \
        float(x[46:54])]))

# A function that applies the symmetry operation in the form of an affine
# transformation
def apply_T(T, v):
    # The parameter v is a numeric array. We need to first add an initial
    # entry of 1 to the vector
    vec = np.array( [1., v[0], v[1], v[2]] )
    out_raw = np.dot(T, vec)
    out = [out_raw[i] for i in [1,2,3]] #Do not need the leading 1
    return(out)



# Loop over all the lines in the input file and update only the atomic
# coordinates, making sure to keep the line spacing the same. First do with
# symmetry moving forward down axis
file = open(f, "r")
output = open("protein_sym.pdb", "w")
for line in file:
    if line[0:5] == 'ATOM ':
        coord = extract_coord(line)
        new_coord = apply_T(T_full, coord)
        # Need to do correct string formatting: 8 characters total, with enough
        # leading whitespace and trailing zeroes to keep 3 decimal places
        # This allows for possible negative signs
        x_new = '{:.3f}'.format(round(new_coord[0], 3)).rjust(8)
        y_new = '{:.3f}'.format(round(new_coord[1], 3)).rjust(8)
        z_new = '{:.3f}'.format(round(new_coord[2], 3)).rjust(8)
        new_line = line[0:30] + x_new + y_new + z_new + line[54:78] + "\n"
        output.write(new_line)
    elif line[0:3] == 'TER' or line[0:3] == 'END':
        output.write(line) #Keep these linese as is
file.close()
output.close

# Repeat with symmetry moving backward down axis (inverse). Read in file again
# to start at top
file_2 = open(f, "r")
output_2 = open("protein_inv.pdb", "w")
for line in file_2:
    if line[0:5] == 'ATOM ':
        coord = extract_coord(line)
        new_coord = apply_T(T_inv_full, coord)
        # Need to do correct string formatting: 8 characters total, with enough
        # leading whitespace and trailing zeroes to keep 3 decimal places
        # This allows for possible negative signs
        x_new = '{:.3f}'.format(round(new_coord[0], 3)).rjust(8)
        y_new = '{:.3f}'.format(round(new_coord[1], 3)).rjust(8)
        z_new = '{:.3f}'.format(round(new_coord[2], 3)).rjust(8)
        new_line = line[0:30] + x_new + y_new + z_new + line[54:78] + "\n"
        output_2.write(new_line)
    elif line[0:3] == 'TER' or line[0:3] == 'END':
        output_2.write(line) #Keep these lines as is
file_2.close()
output_2.close

In my example, there were only 4 copies of the ligand in the cryo-EM structure. I first saved a .pdb file for one ligand on the end of the stack next to the empty site on the fibril (lig_1.pdb). Then I applied the symmetry operation once on that single ligand to make a new end of the stack (lig_0.pdb). Then I saved the new stack of 5 as a single file (lig_stack_5.pdb) to make a stack of 5 ligands. Then I applied the symmetry operation 5 times on the stack of 5 in either direction to make lig_stack_sym.pdb and lig_stack_inv.pdb.

5. Preparing for Blastermaster

Next we need to make one .pdb file for the protein and one for the ligand stack with all these atoms, the proper line numbering and labeling, and no nonpolar hydrogens. To do this, open protein.pdb, protein_sym.pdb, protein_inv.pdb, lig_stack_5.pdb, lig_stack_sym.pdb, and lig_stack_inv.pdb in Chimera. Delete all the nonpolar hydrogens with the commands sel HC followed by del sel. Select the 3 protein models in the Model Panel and click “copy/combine molecular models.” Use these options:

  • Make sure all 3 are selected (protein.pdb, protein_sym.pdb, and protein_inv.pdb)

  • Name the new model whatever you want

  • Give the new model a new ID

  • Use the coordinate system of the original protein.pdb

  • Select the option to “rename them uniquely” if any molecules have duplicate single-letter chain IDs

Then hit OK. Back in the Model Panel, highlight just the combined model, click “write PDB,” select just the combined model, save relative to model protein.pdb, and name the file rec_noHC.pdb. Check that the model coordinates and labeleing numbering are correct in Chimera and a text editor. Keep only the ATOM lines with the Bash command

grep ATOM rec_noHC.pdb > rec_prepped_noHC.pdb

Repeat the step of combining the models into one .pdb file for lig_stack_5.pdb, lig_stack_sym.pdb, and lig_stack_inv.pdb in Chimera. Call the output lig_15_copies.pdb.

We need to format the protein fibril and ligand stack for Blastermaster. I am using Blastermaster here, and not Pydock3 or DockOpt, because the script I use for modeling the low dielectric boundary of the ligand stack was made for the output of Blastermaster. Feel free to update for using Pydock3/DockOpt.

The following steps are how I do the formatting for Blastermaster; some people just plop the prepared structure into Blastermaster or DOCKOpt as rec.pdb, with no rec.crg.pdb. There may be slight differences in the resulting grids. For the way that I learned, we need to create the rec.pdb and rec.crg.pdb files.

Just to be safe, if running scripts on our cluster, make sure to source the right path and environment. Don’t worry about DOCK3.7 vs. DOCK3.8 here; we are only using these paths for Blastermaster, which 3.8 did not change.

export DOCKBASE=/nfs/soft/dock/versions/dock37/DOCK-3.7-trunk
source /nfs/soft/dock/versions/dock37/DOCK-3.7-trunk/env.sh

From rec_prepped_noHC.pdb, make a copy called rec.crg.pdb that has each histidine renamed to HID, HIE, or HIP, depending on its protonation state. On our cluster, you can run the command

python /mnt/nfs/home/ttummino/zzz.scripts/protein_prep/replace_his_with_hie_hid_hip.py rec_prepped_noHC.pdb rec.crg.pdb

To make the rec.pdb, remove all remaining (polar) hydrogens and rename any “CYM” residues back to “CYS”. On our cluster, you can run the commands

python /mnt/nfs/home/ttummino/zzz.scripts/protein_prep/0000_remove_hydrogens_from_pdb.py rec_prepped_noHC.pdb
# Output is rec_noH.pdb
sed "s/CYM/CYS/" rec_noH.pdb > rec.pdb

We still need to format the ligand stack file so that Blastmaster knows to put low dielectric thinspheres at the locations of the heavy atoms. We can do this by just changing all the heavy atoms (assuming we removed all the hydrogens before) to carbons and their residue types to “LPD” with this Python code:

import os #For reading files in directory

dir = {PATH} #Equivalent on your computer
file_name = "lig_15_copies.pdb"
f = os.path.join(dir, file_name)

file = open(f, "r")
output = open("shell-LPD.pdb", "w")
for line in file:
    if line[0:5] == 'ATOM ':
        new_line = 'HETATM' + line[6:12] + ' C  ' + ' LPD' + line[20:77] + 'C\n'
        output.write(new_line)
file.close()
output.close()

6. Modifying the INDOCK and Creating the affine_transformation.txt

It’s easiest to make the INDOCK and affine_transformation.txt before running Blastermaster. There are multiple steps within Blastermaster that copy previous working directories, so if we put the right INDOCK and affine_transformation.txt into the working directory at the beginning, they should get carried to the end.

The affine_transformation.txt gives the affine transformation matrix. Use my example from docking EGCG for formatting. Here, the transformation is rotating in the XY-plane and translating down the Z-axis.

rot_mat_row_1     1.000    0.019    0.000
rot_mat_row_2    -0.019    1.000    0.000
rot_mat_row_3     0.000    0.000    1.000
transl_vec        0.000    0.000    4.800

Make sure the affine_transformation.txt is in the ultimate dockfiles directory.

These are the required changes to the INDOCK file, created along with the initial DOCK grids with the Blastermaster call:

  • Set minimize to no

  • Add a section before the end of INDOCK line:

#####################################################
# SYMMETRY
sym_affine_file               ../dockfiles/affine_transformation.txt
sym_dist_thresh               2.0
max_lig_lig_vdw               0.0
scale_lig_lig_vdw             1.0

The sym_affine_file refers to the formatted affine_transformation.txt file described above. As long as you keep that file in the dockfiles directory and the INDOCK in the larger directory containing dockfiles (the custom when working on our cluster, but NOT when submitting on Wynton with the super script), this relative path should be fine.

The sym_dist_thresh parameter refers to the interatomic distance in Ångstroms below which two atoms in molecular symmetry mates are “clashing.” The max_lig_lig_vdw parameter is the maximum value in DOCK “kcal/mol” energy units for the ligand-ligand van der Waals energy to reach before NOT being added to the total score. This energy is negative when there is no clash, so a maximum of 0.0 means to always include this contribution to the total energy. The scale_lig_lig_vdw parameter is the scalar value to multiply the ligand-ligand van der Waals energy by before adding to the total energy.

The example values here are the defaults.

7. Generating and Modifiying the DOCK Score Grids

With rec.pdb, rec.crg.pdb, xtal-lig.pdb, and shell-LPD.pdb, we are now ready to run Blastermaster. I am writing this assuming you will do some kind of optimization based on the thin sphere scan, so I will show you how to run Blastermaster with different thin sphere radii, add the ligand low dielectric region to each setup, and properly recombine the directories.

Begin by putting rec.pdb, rec.crg.pdb, xtal-lig.pdb, and shell-LPD.pdb into the directory you are using on the cluster. Run the export/source lines from the Section 5 (if not already done) and then these Bash commands:

mkdir working
cp rec.pdb working
cp rec.crg.pdb working
cp xtal-lig.pdb working
$DOCKBASE/proteins/blastermaster/blastermaster.py -v --addNOhydrogensflag

Next we run Step #1 of Reed’s thin sphere scan, which runs in parallel on our cluster. We won’t need to run Step #2.

mkdir ES-LD_Scan
ORIG_PATH=$PWD
cd ES-LD_Scan
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0001_generate_ES_LD_generation.py -p $ORIG_PATH
unset ORIG_PATH #Just in case another script uses this variable

When these jobs finish on the cluster, return to the larger directory holding working and make new grids in with_Ligand_Stack, with the ligand stack:

cd .. #Exit working
mkdir with_Ligand_Stack
cp shell-LPD.pdb with_Ligand_Stack
sh /mnt/nfs/exj/work/matt715/ex9_backup/matt715/Symmetry/Symmetry/New_Thinspheres/with_Ligand_Stack/blast-membrane-thinsph-scan.sh {PATH}/ES-LS_Scan {PATH_TO_LARGER_DIRECTORY_CONTAINING_working_AND_dockfiles}
# Script written by Stefan and Andrii

Make sure the correct INDOCK and affine_transformation.txt are in the new dockfiles directories in each new thin sphere directory. Then run your chosen optimization scheme (logAUC enrichment of known binders over decoys, RMSD in reproducing experimental poses, etc.), making sure to use the DOCK executable equipped for SymDOCK. Once you have your optimized setup, dock to that like you normally would (making sure to use the SymDOCK executable).

8. Analyzing Results with Symmetry

You can extract and filter the results from SymDOCK like normal. There will be a new column in the OUTDOCK for successfully docked molecules for the ligand-ligand van der Waals energy, called l-l_vdW and overwritten in column 13 (column 14 in the extract_all.sort.uniq.txt). In the poses.mol2, each molecule will have that information stored under the Lig-Lig vdW flag. You can use this as a column in the Chimera ViewDOCK feature.

If you want to make a stack of ligands for each ligand in your poses.mol2 file, keeping all the commented information for each one, you can use this Python script that Ian S. Knight wrote, called make_symmetry.py. By default, the script makes a stack with 7 new symmetry mates on each side of the original, but you can change it using the num_applications=7 and bidirectional=True parameters in the Pydock3 function write_mol2_file_with_molecules_cloned_and_transformed.

from pydock3 import files
import numpy as np
import fire


def main(affine_transformation_file_path, poses_mol2_file_path, trans_poses_mol2_file_path='trans_poses.mol2'):
    with open(affine_transformation_file_path, 'r') as f:
        rot_mat = []
        transl_vec = []
        for line in f.readlines():
            line.strip().split()
            row_label, x, y, z = line.split()
            row_data = [float(x), float(y), float(z)]
            if row_label.startswith("rot_mat"):
                rot_mat.append(row_data)
            elif row_label == "transl_vec":
                transl_vec = row_data
        rot_mat = np.array(rot_mat)
        transl_vec = np.array(transl_vec)
    mol2_file = files.Mol2File(poses_mol2_file_path)
    mol2_file.write_mol2_file_with_molecules_cloned_and_transformed(rot_mat, transl_vec, trans_poses_mol2_file_path, num_applications=7, bidirectional=True)

if __name__ == '__main__':
    fire.Fire(main)

On our cluster, you can find an example of this script at /nfs/home/matt715/EGCG_Rescue/EGCG_3. To run this script, with the appropriate sourcing on our cluster, make sure you are on gimel5, epyc, or another node that can handle the right Python version and Pydock3 (so NOT the gimel head node). Then run in Bash:

source /mnt/nfs/soft/ian/python3.8.5.sh #For our cluster
python make_symmetry.py affine_transformation.txt poses.mol2
# Optional parameter: --trans_poses_mol2_file_path='trans_poses.mol2')

The output will be the trans_poses.mol2 file, unless you decide to rename it.