SymDOCK: Docking with Symmetry and Cooperativity ================================================ This guide is for using the symmetric extension to DOCK3.8 (SymDOCK) for Docking ligands against protein fibrils in a symmetric way and incorporating a crude ligand-ligand interaction energy. The same approach works for protein systems with other symmetry groups (e.g. cyclic groups :math:`Z_n` for integer :math:`n`), but I'll keep the tutorial to the infinite fibril case (symmetry group of the integers :math:`\mathbb{Z}` under addition). Written by Matthew S. Smith. Please see the publication Matthew S. Smith, Ian S. Knight, Rian C. Kormos, Joseph G. Pepe, Peter Kunach, Marc I. Diamond, Sarah H. Shahmoradian, John J. Irwin, William F. DeGrado, and Brian K. Shoichet. "Docking for Molecules That Bind in a Symmetric Stack with SymDOCK." *Journal of Chemical Information and Modeling* (2024)64(2):425-434. https://doi.org/10.1021/acs.jcim.3c01749. 1. What You Will Need ------------------------------------------------- A cleaned up ``.pdb`` file for the protein (which I have called ``protein.pdb``), a ``.pdb`` file for a single ligand in the region to dock (which I have called ``xtal-lig.pdb``), a ``.pdb`` file for the ligand stack (which I have called ``lig_stack.pdb``), an ``INDOCK`` file, a version of DOCK3.8 that supports symmetry, and a programming language you are comfortable using (I am using Python). 2. Background Math ------------------------------------------------- You can safely ignore this section if you are not interested in the linear algebra. We can represent the symmetry transformations of an atomic system (proteins, small molecules, etc.) by an **affine transformation**, which is composed of a linear transformation followed by a translation. For vectors in 3D (:math:`\mathbb{R}^3`), we can still represent an affine transformation by a matrix by appending a "1" to the vector and using a :math:`4\times4` matrix. As an example, consider the vector :math:`(x,y,z)`. If we want to apply a linear transformation given by the :math:`3\times3` matrix :math:`R` to this vector and then follow with a translation by the vector :math:`(a,b,c)`, we can do the multiplication .. math:: \begin{pmatrix} 1 & 0 & 0 & 0\\ a & & & \\ b & & R & \\ c & & & \end{pmatrix} \begin{pmatrix} 1\\ x\\ y\\ z \end{pmatrix} = \begin{pmatrix} 1\\ \begin{pmatrix} a\\ b\\ c \end{pmatrix} + R \begin{pmatrix} x\\ y\\ z \end{pmatrix} \end{pmatrix}. In our case, the matrix :math:`R` is a rotation in 3 dimensions. **Note:** The convention in algebra is to append the 1 at the end of the vector, putting the line of :math:`(1,0,0,0)` at the bottom of the matrix, not the top. I started this project using the convention in statistics of appending a 1 to the start of a data vector to allow for easy displacement by an intercept/bias in a linear model, and I have kept that convention here. The actual docking executable uses the rotation matrix and translation vector as separate inputs, so feel free to adjust the code in this tutorial to fit the algebra convention. To find the affine transformation from one set of coordinates to another, select 4 vectors :math:`(a_x,a_y,a_z)`, :math:`(b_x,b_y,b_z)`, :math:`(c_x,c_y,c_z)`, and :math:`(d_x,d_y,d_z)` in the first coordinate frame and their images :math:`(a^\prime_x,a^\prime_y,a^\prime_z)`, :math:`(b^\prime_x,b^\prime_y,b^\prime_z)`, :math:`(c^\prime_x,c^\prime_y,c^\prime_z)`, and :math:`(d^\prime_x,d^\prime_y,d^\prime_z)` in the second frame, respectively. Then construct the matrices .. math:: A = \begin{pmatrix} 1 & 1 & 1 & 1\\ a_x & b_x & c_x & d_x\\ a_y & b_y & c_y & d_y\\ a_z & b_z & c_z & d_z \end{pmatrix} and .. math:: B = \begin{pmatrix} 1 & 1 & 1 & 1\\ a^\prime_x & b^\prime_x & c^\prime_x & d^\prime_x\\ a^\prime_y & b^\prime_y & c^\prime_y & d^\prime_y\\ a^\prime_z & b^\prime_z & c^\prime_z & d^\prime_z \end{pmatrix}. If :math:`T` is the affine transformation matrix going from the first set of coordinates to the second, then by simple matrix multiplication, .. math:: TA=B. The matrix :math:`A` has the same determinant as the matrix .. math:: \begin{pmatrix} a_x - d_x & b_x - d_x & c_x - d_x\\ a_y - d_y & b_y - d_y & c_y - d_y\\ a_z - d_z & b_z - d_z & c_z - d_z \end{pmatrix} (and all cyclic permutations of :math:`a`, :math:`b`, :math:`c`, :math:`d`), and the determinant of this matrix is zero if and only if the 4 vectors are coplanar. Put another way, if we have chosen the points :math:`(a_x,a_y,a_z)`, :math:`(b_x,b_y,b_z)`, :math:`(c_x,c_y,c_z)`, and :math:`(d_x,d_y,d_z)` to be **non-coplanar**, then :math:`A` will be invertible, and we can find :math:`T` as .. math:: T=BA^{-1} 3. Getting the Affine Transformation Matrix ------------------------------------------------- For most protein systems, the PDB entry and publication will not include the affine transformation matrix in a neat, nice form. If the structure is big enough, we can estimate the transformation by following the symmetry of a few example points. Here is some example code I wrote for an early version of the structure of AD PHF tau bound to GTP-1 (PDB ID 8FUG; Merz GE, Chalkley MH, Tan SK, Tse E, Lee J, Prusiner SB, Paras NA, DeGrado WF, and Southworth DR. "Stacked binding of a PET ligand to Alzheimer’s tau paired helical filaments." *Nature Communications* (2023)14:3048. https://doi.org/10.1038/s41467-023-38537-y). I used many more example points than these, but I'll keep this short (ish): .. code-block:: python import os #For reading files in directory import numpy as np #For NumPy functions import statistics #For mean and stdev np.set_printoptions(suppress=True) #To not print out in scientific notation dir = {PATH} #Equivalent on your computer file_name = "protein.pdb" f = os.path.join(dir, file_name) # A helper fumction to extract coordinates from a line def extract_coord(x): # x is a line of a PDB file return(np.array([float(x[30:38]), float(x[38:46]), \ float(x[46:54])])) # Make a function to extract the relevant coordinates and calculate the # symmetry operation def get_T(res_1, res_2, atom_1, atom_2, atom_3, atom_4): # Each residue parameter should be a string of the form "RES A ###", for # the residue type, chain name, and residue number (9 characters total) # Each atom parameter should be a string of 4 characters properly aligned # to the atom type in a PDB file in columns 13-16 # This function returns a 4x4 NumPy array (matrix) file = open(f, "r") for line in file: if line[0:5] == 'ATOM ' and line[17:26] == res_1: if line[12:16] == atom_1: a = extract_coord(line) if line[12:16] == atom_2: b = extract_coord(line) if line[12:16] == atom_3: c = extract_coord(line) if line[12:16] == atom_4: d = extract_coord(line) if line[0:5] == 'ATOM ' and line[17:26] == res_2: if line[12:16] == atom_1: a_prime = extract_coord(line) if line[12:16] == atom_2: b_prime = extract_coord(line) if line[12:16] == atom_3: c_prime = extract_coord(line) if line[12:16] == atom_4: d_prime = extract_coord(line) file.close() # Use Python list comprehension and NumPy to make the matrices A_mat = np.vstack([ np.array([1.0 for i in range(1,5)]), \ np.array([ [v[i] for v in [a, b, c, d] ] for i in [0,1,2]]) ]) B_mat = np.vstack([ np.array([1.0 for i in range(1,5)]), \ np.array([ [v[i] for v in [a_prime, b_prime, c_prime, d_prime] ] for i in [0,1,2]]) ]) T_mat = np.dot(B_mat, np.linalg.inv(A_mat)) # Need to update the first row of T_mat T_mat[0] = [1.0, 0., 0., 0.] # Need to round to 3 decimal places, since that's the significant figures # of the PDB file input T_mat = np.around(T_mat, 3) # Return the symmetry matrix T return(T_mat) # Start with atoms in Q351 in chains G and I: # N, O, CB, and CD Q351_GI = get_T('GLN G 351', 'GLN I 351', ' N ', ' O ', ' CB ', ' CD ') # Next try with the same atoms but chains I to A Q351_IA = get_T('GLN I 351', 'GLN A 351', ' N ', ' O ', ' CB ', ' CD ') # Next try with the same atoms but chains A to E Q351_AE = get_T('GLN A 351', 'GLN E 351', ' N ', ' O ', ' CB ', ' CD ') # Next try with the same atoms but chains E to C Q351_EC = get_T('GLN E 351', 'GLN C 351', ' N ', ' O ', ' CB ', ' CD ') # Now try with atoms on ligand from chains P to O: # F2, C21, C23, C32 lig_PO = get_T('UNK P 1', 'UNK O 1', ' F2 ', ' C21', ' C23', ' C32') # Next try with the same atoms but ligands O to R lig_OR = get_T('UNK O 1', 'UNK R 1', ' F2 ', ' C21', ' C23', ' C32') # Next try with the same atoms but ligands R to Q lig_RQ = get_T('UNK R 1', 'UNK Q 1', ' F2 ', ' C21', ' C23', ' C32') # Example of printing out a full affine transformation matrix: print(Q351_GI) # Seems like rotation matrix is pretty consistent (not shown), using # Small-Angle Approximation. But more variation in translation vector. Use # Python list comprehension print("Q351 Atoms N, O, CB, and CD from Chains G to I") Q351_GI_trans = [ Q351_GI[i,0] for i in [1,2,3] ] print(Q351_GI_trans) print() print("Q351 Atoms N, O, CB, and CD from Chains I to A") Q351_IA_trans = [ Q351_IA[i,0] for i in [1,2,3] ] print(Q351_IA_trans) print() print("Q351 Atoms N, O, CB, and CD from Chains A to E") Q351_AE_trans = [ Q351_AE[i,0] for i in [1,2,3] ] print(Q351_AE_trans) print() print("Q351 Atoms N, O, CB, and CD from Chains E to C") Q351_EC_trans = [ Q351_EC[i,0] for i in [1,2,3] ] print(Q351_EC_trans) print() print("Ligand Atoms F2, C21, C23, and C32 from Chains P to O") lig_PO_trans = [ lig_PO[i,0] for i in [1,2,3] ] print(lig_PO_trans) print() print("Ligand Atoms F2, C21, C23, and C32 from Chains O to R") lig_OR_trans = [ lig_OR[i,0] for i in [1,2,3] ] print(lig_OR_trans) print() print("Ligand Atoms F2, C21, C23, and C32 from Chains R to Q") lig_RQ_trans = [ lig_RQ[i,0] for i in [1,2,3] ] print(lig_RQ_trans) print() # Mean print("Mean") print("x") print(statistics.mean( [v[0] for v in [Q351_GI_trans, Q351_IA_trans, \ Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] )) print("y") print(statistics.mean( [v[1] for v in [Q351_GI_trans, Q351_IA_trans, \ Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] )) print("z") print(statistics.mean( [v[2] for v in [Q351_GI_trans, Q351_IA_trans, \ Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] )) print() # Sample standard deviation, with N-1 degrees of freedom print("Standard Deviation") print("x") print(statistics.stdev( [v[0] for v in [Q351_GI_trans, Q351_IA_trans, \ Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] )) print("y") print(statistics.stdev( [v[1] for v in [Q351_GI_trans, Q351_IA_trans, \ Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] )) print("z") print(statistics.stdev( [v[2] for v in [Q351_GI_trans, Q351_IA_trans, \ Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] )) 4. Extending the Protein Fibril and Ligand Stack ------------------------------------------------ Now that we have the affine transformation matrix, we need to use it to extend the protein fibril and ligand stack. We need to extend the protein fibril so that the DOCK socre grids do not have edge effects. We will also need a stack of ligands that fills the full protein fibril to model the change in dielectric upon binding of a stack. My example was a bit tricky, because the structure had 5 copies of the protein monomer but only 4 copies of the ligand. Your system might have weird exceptions like this, so I will include what I did. Let's start with just the protein fibril. The structure has 5 copies of the protein monomer, so to easily make a system larger than the DOCK grid box, we will just copy and plop the whole system on either side of the existing fibril section. The symmetry operation we calculated earlier is just for going from one monomer to the next. We need to apply it 5 times in either direction (as is and inverted) in order to move the whole fibril section. ``T_mat`` is the symmetry matrix I settled on after the analysis above. .. code-block:: python import os #For reading files in directory import numpy as np #For NumPy functions # Symmetry operator as an affine transformation (in SE(R^3)) T_mat = np.array([ [1., 0., 0., 0.], [-2.238, 1., 0.019, 0.], \ [2.266, -0.019, 1., 0], [4.747, 0., 0., 1.] ]) # Need to apply 5 times to get the movement of the entire 5-strand bundle # down the fibril T_full = np.linalg.matrix_power(T_mat, 5) # Inverse symmetry operator T_inv = np.array([ [1., 0., 0., 0.], [2.280, 1., -0.019, 0.], \ [-2.223, 0.019, 1., 0], [-4.747, 0., 0., 1.] ]) T_inv_full = np.linalg.matrix_power(T_inv, 5) dir = {PATH} #Equivalent on your computer file_name = "protein.pdb" f = os.path.join(dir, file_name) # A helper fumction to extract coordinates from a line def extract_coord(x): # x is a line of a PDB file return(np.array([float(x[30:38]), float(x[38:46]), \ float(x[46:54])])) # A function that applies the symmetry operation in the form of an affine # transformation def apply_T(T, v): # The parameter v is a numeric array. We need to first add an initial # entry of 1 to the vector vec = np.array( [1., v[0], v[1], v[2]] ) out_raw = np.dot(T, vec) out = [out_raw[i] for i in [1,2,3]] #Do not need the leading 1 return(out) # Loop over all the lines in the input file and update only the atomic # coordinates, making sure to keep the line spacing the same. First do with # symmetry moving forward down axis file = open(f, "r") output = open("protein_sym.pdb", "w") for line in file: if line[0:5] == 'ATOM ': coord = extract_coord(line) new_coord = apply_T(T_full, coord) # Need to do correct string formatting: 8 characters total, with enough # leading whitespace and trailing zeroes to keep 3 decimal places # This allows for possible negative signs x_new = '{:.3f}'.format(round(new_coord[0], 3)).rjust(8) y_new = '{:.3f}'.format(round(new_coord[1], 3)).rjust(8) z_new = '{:.3f}'.format(round(new_coord[2], 3)).rjust(8) new_line = line[0:30] + x_new + y_new + z_new + line[54:78] + "\n" output.write(new_line) elif line[0:3] == 'TER' or line[0:3] == 'END': output.write(line) #Keep these linese as is file.close() output.close # Repeat with symmetry moving backward down axis (inverse). Read in file again # to start at top file_2 = open(f, "r") output_2 = open("protein_inv.pdb", "w") for line in file_2: if line[0:5] == 'ATOM ': coord = extract_coord(line) new_coord = apply_T(T_inv_full, coord) # Need to do correct string formatting: 8 characters total, with enough # leading whitespace and trailing zeroes to keep 3 decimal places # This allows for possible negative signs x_new = '{:.3f}'.format(round(new_coord[0], 3)).rjust(8) y_new = '{:.3f}'.format(round(new_coord[1], 3)).rjust(8) z_new = '{:.3f}'.format(round(new_coord[2], 3)).rjust(8) new_line = line[0:30] + x_new + y_new + z_new + line[54:78] + "\n" output_2.write(new_line) elif line[0:3] == 'TER' or line[0:3] == 'END': output_2.write(line) #Keep these lines as is file_2.close() output_2.close In my example, there were only 4 copies of the ligand in the cryo-EM structure. I first saved a ``.pdb`` file for one ligand on the end of the stack next to the empty site on the fibril (``lig_1.pdb``). Then I applied the symmetry operation once on that single ligand to make a new end of the stack (``lig_0.pdb``). Then I saved the new stack of 5 as a single file (``lig_stack_5.pdb``) to make a stack of 5 ligands. Then I applied the symmetry operation 5 times on the stack of 5 in either direction to make ``lig_stack_sym.pdb`` and ``lig_stack_inv.pdb``. 5. Preparing for Blastermaster ------------------------------ Next we need to make one ``.pdb`` file for the protein and one for the ligand stack with all these atoms, the proper line numbering and labeling, and no nonpolar hydrogens. To do this, open ``protein.pdb``, ``protein_sym.pdb``, ``protein_inv.pdb``, ``lig_stack_5.pdb``, ``lig_stack_sym.pdb``, and ``lig_stack_inv.pdb`` in Chimera. Delete all the nonpolar hydrogens with the commands ``sel HC`` followed by ``del sel``. Select the 3 protein models in the Model Panel and click "copy/combine molecular models." Use these options: * Make sure all 3 are selected (``protein.pdb``, ``protein_sym.pdb``, and ``protein_inv.pdb``) * Name the new model whatever you want * Give the new model a new ID * Use the coordinate system of the original ``protein.pdb`` * Select the option to "rename them uniquely" if any molecules have duplicate single-letter chain IDs Then hit OK. Back in the Model Panel, highlight just the combined model, click "write PDB," select just the combined model, save relative to model ``protein.pdb``, and name the file ``rec_noHC.pdb``. Check that the model coordinates and labeleing numbering are correct in Chimera and a text editor. Keep only the ``ATOM`` lines with the Bash command .. code-block:: bash grep ATOM rec_noHC.pdb > rec_prepped_noHC.pdb Repeat the step of combining the models into one ``.pdb`` file for ``lig_stack_5.pdb``, ``lig_stack_sym.pdb``, and ``lig_stack_inv.pdb`` in Chimera. Call the output ``lig_15_copies.pdb``. We need to format the protein fibril and ligand stack for Blastermaster. I am using Blastermaster here, and not Pydock3 or DockOpt, because the script I use for modeling the low dielectric boundary of the ligand stack was made for the output of Blastermaster. Feel free to update for using Pydock3/DockOpt. The following steps are how I do the formatting for Blastermaster; some people just plop the prepared structure into Blastermaster or DOCKOpt as ``rec.pdb``, with no ``rec.crg.pdb``. There may be slight differences in the resulting grids. For the way that I learned, we need to create the ``rec.pdb`` and ``rec.crg.pdb`` files. Just to be safe, if running scripts on our cluster, make sure to source the right path and environment. Don't worry about DOCK3.7 vs. DOCK3.8 here; we are only using these paths for Blastermaster, which 3.8 did not change. .. code-block:: python export DOCKBASE=/nfs/soft/dock/versions/dock37/DOCK-3.7-trunk source /nfs/soft/dock/versions/dock37/DOCK-3.7-trunk/env.sh From ``rec_prepped_noHC.pdb``, make a copy called ``rec.crg.pdb`` that has each histidine renamed to ``HID``, ``HIE``, or ``HIP``, depending on its protonation state. On our cluster, you can run the command .. code-block:: python python /mnt/nfs/home/ttummino/zzz.scripts/protein_prep/replace_his_with_hie_hid_hip.py rec_prepped_noHC.pdb rec.crg.pdb To make the ``rec.pdb``, remove all remaining (polar) hydrogens and rename any "CYM" residues back to "CYS". On our cluster, you can run the commands .. code-block:: python python /mnt/nfs/home/ttummino/zzz.scripts/protein_prep/0000_remove_hydrogens_from_pdb.py rec_prepped_noHC.pdb # Output is rec_noH.pdb sed "s/CYM/CYS/" rec_noH.pdb > rec.pdb We still need to format the ligand stack file so that Blastmaster knows to put low dielectric thinspheres at the locations of the heavy atoms. We can do this by just changing all the heavy atoms (assuming we removed all the hydrogens before) to carbons and their residue types to "LPD" with this Python code: .. code-block:: python import os #For reading files in directory dir = {PATH} #Equivalent on your computer file_name = "lig_15_copies.pdb" f = os.path.join(dir, file_name) file = open(f, "r") output = open("shell-LPD.pdb", "w") for line in file: if line[0:5] == 'ATOM ': new_line = 'HETATM' + line[6:12] + ' C ' + ' LPD' + line[20:77] + 'C\n' output.write(new_line) file.close() output.close() 6. Modifying the ``INDOCK`` and Creating the ``affine_transformation.txt`` -------------------------------------------------------------------------- It's easiest to make the ``INDOCK`` and ``affine_transformation.txt`` before running Blastermaster. There are multiple steps within Blastermaster that copy previous ``working directories``, so if we put the right ``INDOCK`` and ``affine_transformation.txt`` into the ``working`` directory at the beginning, they should get carried to the end. The ``affine_transformation.txt`` gives the affine transformation matrix. Use my example from docking EGCG for formatting. Here, the transformation is rotating in the XY-plane and translating down the Z-axis. .. code:: rot_mat_row_1 1.000 0.019 0.000 rot_mat_row_2 -0.019 1.000 0.000 rot_mat_row_3 0.000 0.000 1.000 transl_vec 0.000 0.000 4.800 Make sure the ``affine_transformation.txt`` is in the ultimate ``dockfiles`` directory. These are the required changes to the ``INDOCK`` file, created along with the initial DOCK grids with the Blastermaster call: * Set ``minimize`` to ``no`` * Add a section before the ``end of INDOCK`` line: .. code:: ##################################################### # SYMMETRY sym_affine_file ../dockfiles/affine_transformation.txt sym_dist_thresh 2.0 max_lig_lig_vdw 0.0 scale_lig_lig_vdw 1.0 The ``sym_affine_file`` refers to the formatted ``affine_transformation.txt`` file described above. As long as you keep that file in the ``dockfiles`` directory and the ``INDOCK`` in the larger directory containing ``dockfiles`` (the custom when working on our cluster, but NOT when submitting on Wynton with the ``super`` script), this relative path should be fine. The ``sym_dist_thresh`` parameter refers to the interatomic distance in Ångstroms below which two atoms in molecular symmetry mates are "clashing." The ``max_lig_lig_vdw`` parameter is the maximum value in DOCK "kcal/mol" energy units for the ligand-ligand van der Waals energy to reach before NOT being added to the total score. This energy is negative when there is no clash, so a maximum of ``0.0`` means to always include this contribution to the total energy. The ``scale_lig_lig_vdw`` parameter is the scalar value to multiply the ligand-ligand van der Waals energy by before adding to the total energy. The example values here are the defaults. 7. Generating and Modifiying the DOCK Score Grids ------------------------------------------------- With ``rec.pdb``, ``rec.crg.pdb``, ``xtal-lig.pdb``, and ``shell-LPD.pdb``, we are now ready to run Blastermaster. I am writing this assuming you will do some kind of optimization based on the thin sphere scan, so I will show you how to run Blastermaster with different thin sphere radii, add the ligand low dielectric region to each setup, and properly recombine the directories. Begin by putting ``rec.pdb``, ``rec.crg.pdb``, ``xtal-lig.pdb``, and ``shell-LPD.pdb`` into the directory you are using on the cluster. Run the export/source lines from the Section 5 (if not already done) and then these Bash commands: .. code-block:: bash mkdir working cp rec.pdb working cp rec.crg.pdb working cp xtal-lig.pdb working $DOCKBASE/proteins/blastermaster/blastermaster.py -v --addNOhydrogensflag Next we run Step #1 of Reed's thin sphere scan, which runs in parallel on our cluster. We won't need to run Step #2. .. code-block:: bash mkdir ES-LD_Scan ORIG_PATH=$PWD cd ES-LD_Scan python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0001_generate_ES_LD_generation.py -p $ORIG_PATH unset ORIG_PATH #Just in case another script uses this variable When these jobs finish on the cluster, return to the larger directory holding ``working`` and make new grids in ``with_Ligand_Stack``, with the ligand stack: .. code-block:: bash cd .. #Exit working mkdir with_Ligand_Stack cp shell-LPD.pdb with_Ligand_Stack sh /mnt/nfs/exj/work/matt715/ex9_backup/matt715/Symmetry/Symmetry/New_Thinspheres/with_Ligand_Stack/blast-membrane-thinsph-scan.sh {PATH}/ES-LS_Scan {PATH_TO_LARGER_DIRECTORY_CONTAINING_working_AND_dockfiles} # Script written by Stefan and Andrii Make sure the correct ``INDOCK`` and ``affine_transformation.txt`` are in the new ``dockfiles`` directories in each new thin sphere directory. Then run your chosen optimization scheme (logAUC enrichment of known binders over decoys, RMSD in reproducing experimental poses, etc.), making sure to use the DOCK executable equipped for SymDOCK. Once you have your optimized setup, dock to that like you normally would (making sure to use the SymDOCK executable). 8. Analyzing Results with Symmetry ------------------------------------------------- You can extract and filter the results from SymDOCK like normal. There will be a new column in the ``OUTDOCK`` for successfully docked molecules for the ligand-ligand van der Waals energy, called ``l-l_vdW`` and overwritten in column 13 (column 14 in the ``extract_all.sort.uniq.txt``). In the ``poses.mol2``, each molecule will have that information stored under the ``Lig-Lig vdW`` flag. You can use this as a column in the Chimera ViewDOCK feature. If you want to make a stack of ligands for each ligand in your ``poses.mol2`` file, keeping all the commented information for each one, you can use this Python script that Ian S. Knight wrote, called ``make_symmetry.py``. By default, the script makes a stack with 7 new symmetry mates on each side of the original, but you can change it using the ``num_applications=7`` and ``bidirectional=True`` parameters in the Pydock3 function ``write_mol2_file_with_molecules_cloned_and_transformed``. .. code-block:: python from pydock3 import files import numpy as np import fire def main(affine_transformation_file_path, poses_mol2_file_path, trans_poses_mol2_file_path='trans_poses.mol2'): with open(affine_transformation_file_path, 'r') as f: rot_mat = [] transl_vec = [] for line in f.readlines(): line.strip().split() row_label, x, y, z = line.split() row_data = [float(x), float(y), float(z)] if row_label.startswith("rot_mat"): rot_mat.append(row_data) elif row_label == "transl_vec": transl_vec = row_data rot_mat = np.array(rot_mat) transl_vec = np.array(transl_vec) mol2_file = files.Mol2File(poses_mol2_file_path) mol2_file.write_mol2_file_with_molecules_cloned_and_transformed(rot_mat, transl_vec, trans_poses_mol2_file_path, num_applications=7, bidirectional=True) if __name__ == '__main__': fire.Fire(main) On our cluster, you can find an example of this script at ``/nfs/home/matt715/EGCG_Rescue/EGCG_3``. To run this script, with the appropriate sourcing on our cluster, make sure you are on ``gimel5``, ``epyc``, or another node that can handle the right Python version and Pydock3 (so NOT the ``gimel`` head node). Then run in Bash: .. code-block:: bash source /mnt/nfs/soft/ian/python3.8.5.sh #For our cluster python make_symmetry.py affine_transformation.txt poses.mol2 # Optional parameter: --trans_poses_mol2_file_path='trans_poses.mol2') The output will be the ``trans_poses.mol2`` file, unless you decide to rename it.