SymDOCK: Docking with Symmetry and Cooperativity
================================================

This guide is for using the symmetric extension to DOCK3.8 (SymDOCK) for Docking
ligands against protein fibrils in a symmetric way and incorporating a crude
ligand-ligand interaction energy. The same approach works for protein systems
with other symmetry groups (e.g. cyclic groups :math:`Z_n` for integer
:math:`n`), but I'll keep the tutorial to the infinite fibril case (symmetry
group of the integers :math:`\mathbb{Z}` under addition).

Written by Matthew S. Smith. Please see the publication Matthew S. Smith, Ian S.
Knight, Rian C. Kormos, Joseph G. Pepe, Peter Kunach, Marc I. Diamond, Sarah H.
Shahmoradian, John J. Irwin, William F. DeGrado, and Brian K. Shoichet. "Docking
for Molecules That Bind in a Symmetric Stack with SymDOCK."
*Journal of Chemical Information and Modeling* (2024)64(2):425-434.
https://doi.org/10.1021/acs.jcim.3c01749.

1. What You Will Need
-------------------------------------------------

A cleaned up ``.pdb`` file for the protein (which I have called
``protein.pdb``), a ``.pdb`` file for a single ligand in the region to dock
(which I have called ``xtal-lig.pdb``), a ``.pdb`` file for the ligand stack
(which I have called ``lig_stack.pdb``), an ``INDOCK`` file, a version of
DOCK3.8 that supports symmetry, and a programming language you are comfortable
using (I am using Python).

2. Background Math
-------------------------------------------------

You can safely ignore this section if you are not interested in the linear
algebra.

We can represent the symmetry transformations of an atomic system (proteins,
small molecules, etc.) by an **affine transformation**, which is composed of a
linear transformation followed by a translation. For vectors in 3D
(:math:`\mathbb{R}^3`), we can still represent an affine transformation by a
matrix by appending a "1" to the vector and using a :math:`4\times4` matrix. As
an example, consider the vector :math:`(x,y,z)`. If we want to apply a linear
transformation given by the :math:`3\times3` matrix :math:`R` to this vector and
then follow with a translation by the vector :math:`(a,b,c)`, we can do the
multiplication

.. math::

    \begin{pmatrix}
    1 & 0 & 0 & 0\\
    a &   &   &  \\
    b &   & R &  \\
    c &   &   &
    \end{pmatrix}
    \begin{pmatrix}
    1\\
    x\\
    y\\
    z
    \end{pmatrix}
    =
    \begin{pmatrix}
    1\\
    \begin{pmatrix}
    a\\
    b\\
    c
    \end{pmatrix}
    + R
    \begin{pmatrix}
    x\\
    y\\
    z
    \end{pmatrix}
    \end{pmatrix}.

In our case, the matrix :math:`R` is a rotation in 3 dimensions.

**Note:** The convention in algebra is to append the 1 at the end of the vector,
putting the line of :math:`(1,0,0,0)` at the bottom of the matrix, not the top.
I started this project using the convention in statistics of appending a 1 to
the start of a data vector to allow for easy displacement by an intercept/bias
in a linear model, and I have kept that convention here. The actual docking
executable uses the rotation matrix and translation vector as separate inputs,
so feel free to adjust the code in this tutorial to fit the algebra convention.

To find the affine transformation from one set of coordinates to another, select
4 vectors :math:`(a_x,a_y,a_z)`, :math:`(b_x,b_y,b_z)`, :math:`(c_x,c_y,c_z)`,
and :math:`(d_x,d_y,d_z)` in the first coordinate frame and their images
:math:`(a^\prime_x,a^\prime_y,a^\prime_z)`,
:math:`(b^\prime_x,b^\prime_y,b^\prime_z)`,
:math:`(c^\prime_x,c^\prime_y,c^\prime_z)`,
and :math:`(d^\prime_x,d^\prime_y,d^\prime_z)` in the second frame,
respectively. Then construct the matrices

.. math::

    A =
    \begin{pmatrix}
    1 & 1 & 1 & 1\\
    a_x & b_x & c_x & d_x\\
    a_y & b_y & c_y & d_y\\
    a_z & b_z & c_z & d_z
    \end{pmatrix}

and

.. math::

    B =
    \begin{pmatrix}
    1 & 1 & 1 & 1\\
    a^\prime_x & b^\prime_x & c^\prime_x & d^\prime_x\\
    a^\prime_y & b^\prime_y & c^\prime_y & d^\prime_y\\
    a^\prime_z & b^\prime_z & c^\prime_z & d^\prime_z
    \end{pmatrix}.

If :math:`T` is the affine transformation matrix going from the first set of
coordinates to the second, then by simple matrix multiplication,

.. math::

    TA=B.

The matrix :math:`A` has the same determinant as the matrix

.. math::

    \begin{pmatrix}
    a_x - d_x & b_x - d_x & c_x - d_x\\
    a_y - d_y & b_y - d_y & c_y - d_y\\
    a_z - d_z & b_z - d_z & c_z - d_z
    \end{pmatrix}

(and all cyclic permutations of :math:`a`, :math:`b`, :math:`c`, :math:`d`), and
the determinant of this matrix is zero if and only if the 4 vectors are
coplanar. Put another way, if we have chosen the points :math:`(a_x,a_y,a_z)`,
:math:`(b_x,b_y,b_z)`, :math:`(c_x,c_y,c_z)`, and :math:`(d_x,d_y,d_z)` to be
**non-coplanar**, then :math:`A` will be invertible, and we can find :math:`T`
as

.. math::

    T=BA^{-1}

3. Getting the Affine Transformation Matrix
-------------------------------------------------

For most protein systems, the PDB entry and publication will not include the
affine transformation matrix in a neat, nice form. If the structure is big
enough, we can estimate the transformation by following the symmetry of a few
example points.

Here is some example code I wrote for an early version of the structure of AD
PHF tau bound to GTP-1 (PDB ID 8FUG; Merz GE, Chalkley MH, Tan SK, Tse E, Lee J,
Prusiner SB, Paras NA, DeGrado WF, and Southworth DR. "Stacked binding of a PET
ligand to Alzheimer’s tau paired helical filaments." *Nature Communications*
(2023)14:3048. https://doi.org/10.1038/s41467-023-38537-y). I used many more
example points than these, but I'll keep this short (ish):

.. code-block:: python

    import os #For reading files in directory
    import numpy as np #For NumPy functions
    import statistics #For mean and stdev
    np.set_printoptions(suppress=True) #To not print out in scientific notation

    dir = {PATH} #Equivalent on your computer
    file_name = "protein.pdb"
    f = os.path.join(dir, file_name)


    # A helper fumction to extract coordinates from a line
    def extract_coord(x):
        # x is a line of a PDB file
        return(np.array([float(x[30:38]), float(x[38:46]), \
            float(x[46:54])]))

    # Make a function to extract the relevant coordinates and calculate the
    # symmetry operation
    def get_T(res_1, res_2, atom_1, atom_2, atom_3, atom_4):
        # Each residue parameter should be a string of the form "RES A ###", for
        # the residue type, chain name, and residue number (9 characters total)
        # Each atom parameter should be a string of 4 characters properly aligned
        # to the atom type in a PDB file in columns 13-16
        # This function returns a 4x4 NumPy array (matrix)
        file = open(f, "r")
        for line in file:
            if line[0:5] == 'ATOM ' and line[17:26] == res_1:
                if line[12:16] == atom_1:
                    a = extract_coord(line)
                if line[12:16] == atom_2:
                    b = extract_coord(line)
                if line[12:16] == atom_3:
                    c = extract_coord(line)
                if line[12:16] == atom_4:
                    d = extract_coord(line)
            if line[0:5] == 'ATOM ' and line[17:26] == res_2:
                if line[12:16] == atom_1:
                    a_prime = extract_coord(line)
                if line[12:16] == atom_2:
                    b_prime = extract_coord(line)
                if line[12:16] == atom_3:
                    c_prime = extract_coord(line)
                if line[12:16] == atom_4:
                    d_prime = extract_coord(line)
        file.close()

        # Use Python list comprehension and NumPy to make the matrices
        A_mat = np.vstack([ np.array([1.0 for i in range(1,5)]), \
            np.array([ [v[i] for v in [a, b, c, d] ] for i in [0,1,2]]) ])
        B_mat = np.vstack([ np.array([1.0 for i in range(1,5)]), \
            np.array([ [v[i] for v in [a_prime, b_prime, c_prime, d_prime] ] for i in [0,1,2]]) ])

        T_mat = np.dot(B_mat, np.linalg.inv(A_mat))

        # Need to update the first row of T_mat
        T_mat[0] = [1.0, 0., 0., 0.]

        # Need to round to 3 decimal places, since that's the significant figures
        # of the PDB file input
        T_mat = np.around(T_mat, 3)

        # Return the symmetry matrix T
        return(T_mat)


    # Start with atoms in Q351 in chains G and I:
    # N, O, CB, and CD
    Q351_GI = get_T('GLN G 351', 'GLN I 351', ' N  ', ' O  ', ' CB ', ' CD ')
    # Next try with the same atoms but chains I to A
    Q351_IA = get_T('GLN I 351', 'GLN A 351', ' N  ', ' O  ', ' CB ', ' CD ')
    # Next try with the same atoms but chains A to E
    Q351_AE = get_T('GLN A 351', 'GLN E 351', ' N  ', ' O  ', ' CB ', ' CD ')
    # Next try with the same atoms but chains E to C
    Q351_EC = get_T('GLN E 351', 'GLN C 351', ' N  ', ' O  ', ' CB ', ' CD ')

    # Now try with atoms on ligand from chains P to O:
    # F2, C21, C23, C32
    lig_PO = get_T('UNK P   1', 'UNK O   1', ' F2 ', ' C21', ' C23', ' C32')
    # Next try with the same atoms but ligands O to R
    lig_OR = get_T('UNK O   1', 'UNK R   1', ' F2 ', ' C21', ' C23', ' C32')
    # Next try with the same atoms but ligands R to Q
    lig_RQ = get_T('UNK R   1', 'UNK Q   1', ' F2 ', ' C21', ' C23', ' C32')

    # Example of printing out a full affine transformation matrix:
    print(Q351_GI)


    # Seems like rotation matrix is pretty consistent (not shown), using
    # Small-Angle Approximation. But more variation in translation vector. Use
    # Python list comprehension
    print("Q351 Atoms N, O, CB, and CD from Chains G to I")
    Q351_GI_trans = [ Q351_GI[i,0] for i in [1,2,3] ]
    print(Q351_GI_trans)
    print()

    print("Q351 Atoms N, O, CB, and CD from Chains I to A")
    Q351_IA_trans = [ Q351_IA[i,0] for i in [1,2,3] ]
    print(Q351_IA_trans)
    print()

    print("Q351 Atoms N, O, CB, and CD from Chains A to E")
    Q351_AE_trans = [ Q351_AE[i,0] for i in [1,2,3] ]
    print(Q351_AE_trans)
    print()

    print("Q351 Atoms N, O, CB, and CD from Chains E to C")
    Q351_EC_trans = [ Q351_EC[i,0] for i in [1,2,3] ]
    print(Q351_EC_trans)
    print()

    print("Ligand Atoms F2, C21, C23, and C32 from Chains P to O")
    lig_PO_trans = [ lig_PO[i,0] for i in [1,2,3] ]
    print(lig_PO_trans)
    print()

    print("Ligand Atoms F2, C21, C23, and C32 from Chains O to R")
    lig_OR_trans = [ lig_OR[i,0] for i in [1,2,3] ]
    print(lig_OR_trans)
    print()

    print("Ligand Atoms F2, C21, C23, and C32 from Chains R to Q")
    lig_RQ_trans = [ lig_RQ[i,0] for i in [1,2,3] ]
    print(lig_RQ_trans)
    print()


    # Mean
    print("Mean")
    print("x")
    print(statistics.mean( [v[0] for v in [Q351_GI_trans, Q351_IA_trans, \
        Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
    print("y")
    print(statistics.mean( [v[1] for v in [Q351_GI_trans, Q351_IA_trans, \
        Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
    print("z")
    print(statistics.mean( [v[2] for v in [Q351_GI_trans, Q351_IA_trans, \
        Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
    print()

    # Sample standard deviation, with N-1 degrees of freedom
    print("Standard Deviation")
    print("x")
    print(statistics.stdev( [v[0] for v in [Q351_GI_trans, Q351_IA_trans, \
        Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
    print("y")
    print(statistics.stdev( [v[1] for v in [Q351_GI_trans, Q351_IA_trans, \
        Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
    print("z")
    print(statistics.stdev( [v[2] for v in [Q351_GI_trans, Q351_IA_trans, \
        Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))

4. Extending the Protein Fibril and Ligand Stack
------------------------------------------------

Now that we have the affine transformation matrix, we need to use it to extend
the protein fibril and ligand stack. We need to extend the protein fibril so
that the DOCK socre grids do not have edge effects. We will also need a stack
of ligands that fills the full protein fibril to model the change in dielectric
upon binding of a stack.

My example was a bit tricky, because the structure had 5 copies of the protein
monomer but only 4 copies of the ligand. Your system might have weird exceptions
like this, so I will include what I did.

Let's start with just the protein fibril. The structure has 5 copies of the
protein monomer, so to easily make a system larger than the DOCK grid box, we
will just copy and plop the whole system on either side of the existing fibril
section. The symmetry operation we calculated earlier is just for going from
one monomer to the next. We need to apply it 5 times in either direction (as is
and inverted) in order to move the whole fibril section. ``T_mat`` is the
symmetry matrix I settled on after the analysis above.

.. code-block:: python

    import os #For reading files in directory
    import numpy as np #For NumPy functions

    # Symmetry operator as an affine transformation (in SE(R^3))
    T_mat = np.array([ [1., 0., 0., 0.], [-2.238, 1., 0.019, 0.], \
        [2.266, -0.019, 1., 0], [4.747, 0., 0., 1.] ])
    # Need to apply 5 times to get the movement of the entire 5-strand bundle
    # down the fibril
    T_full = np.linalg.matrix_power(T_mat, 5)

    # Inverse symmetry operator
    T_inv = np.array([ [1., 0., 0., 0.], [2.280, 1., -0.019, 0.], \
        [-2.223, 0.019, 1., 0], [-4.747, 0., 0., 1.] ])
    T_inv_full = np.linalg.matrix_power(T_inv, 5)

    dir = {PATH} #Equivalent on your computer
    file_name = "protein.pdb"
    f = os.path.join(dir, file_name)


    # A helper fumction to extract coordinates from a line
    def extract_coord(x):
        # x is a line of a PDB file
        return(np.array([float(x[30:38]), float(x[38:46]), \
            float(x[46:54])]))

    # A function that applies the symmetry operation in the form of an affine
    # transformation
    def apply_T(T, v):
        # The parameter v is a numeric array. We need to first add an initial
        # entry of 1 to the vector
        vec = np.array( [1., v[0], v[1], v[2]] )
        out_raw = np.dot(T, vec)
        out = [out_raw[i] for i in [1,2,3]] #Do not need the leading 1
        return(out)


    # Loop over all the lines in the input file and update only the atomic
    # coordinates, making sure to keep the line spacing the same. First do with
    # symmetry moving forward down axis
    file = open(f, "r")
    output = open("protein_sym.pdb", "w")
    for line in file:
        if line[0:5] == 'ATOM ':
            coord = extract_coord(line)
            new_coord = apply_T(T_full, coord)
            # Need to do correct string formatting: 8 characters total, with enough
            # leading whitespace and trailing zeroes to keep 3 decimal places
            # This allows for possible negative signs
            x_new = '{:.3f}'.format(round(new_coord[0], 3)).rjust(8)
            y_new = '{:.3f}'.format(round(new_coord[1], 3)).rjust(8)
            z_new = '{:.3f}'.format(round(new_coord[2], 3)).rjust(8)
            new_line = line[0:30] + x_new + y_new + z_new + line[54:78] + "\n"
            output.write(new_line)
        elif line[0:3] == 'TER' or line[0:3] == 'END':
            output.write(line) #Keep these linese as is
    file.close()
    output.close

    # Repeat with symmetry moving backward down axis (inverse). Read in file again
    # to start at top
    file_2 = open(f, "r")
    output_2 = open("protein_inv.pdb", "w")
    for line in file_2:
        if line[0:5] == 'ATOM ':
            coord = extract_coord(line)
            new_coord = apply_T(T_inv_full, coord)
            # Need to do correct string formatting: 8 characters total, with enough
            # leading whitespace and trailing zeroes to keep 3 decimal places
            # This allows for possible negative signs
            x_new = '{:.3f}'.format(round(new_coord[0], 3)).rjust(8)
            y_new = '{:.3f}'.format(round(new_coord[1], 3)).rjust(8)
            z_new = '{:.3f}'.format(round(new_coord[2], 3)).rjust(8)
            new_line = line[0:30] + x_new + y_new + z_new + line[54:78] + "\n"
            output_2.write(new_line)
        elif line[0:3] == 'TER' or line[0:3] == 'END':
            output_2.write(line) #Keep these lines as is
    file_2.close()
    output_2.close

In my example, there were only 4 copies of the ligand in the cryo-EM structure.
I first saved a ``.pdb`` file for one ligand on the end of the stack next to the
empty site on the fibril (``lig_1.pdb``). Then I applied the symmetry operation
once on that single ligand to make a new end of the stack (``lig_0.pdb``). Then
I saved the new stack of 5 as a single file (``lig_stack_5.pdb``) to make a
stack of 5 ligands. Then I applied the symmetry operation 5 times on the stack
of 5 in either direction to make ``lig_stack_sym.pdb`` and
``lig_stack_inv.pdb``.

5. Preparing for Blastermaster
------------------------------

Next we need to make one ``.pdb`` file for the protein and one for the ligand
stack with all these atoms, the proper line numbering and labeling, and no
nonpolar hydrogens. To do this, open ``protein.pdb``, ``protein_sym.pdb``,
``protein_inv.pdb``, ``lig_stack_5.pdb``, ``lig_stack_sym.pdb``, and
``lig_stack_inv.pdb`` in Chimera. Delete all the nonpolar hydrogens with the
commands ``sel HC`` followed by ``del sel``. Select the 3 protein models in the
Model Panel and click "copy/combine molecular models." Use these options:

* Make sure all 3 are selected (``protein.pdb``, ``protein_sym.pdb``, and ``protein_inv.pdb``)
* Name the new model whatever you want
* Give the new model a new ID
* Use the coordinate system of the original ``protein.pdb``
* Select the option to "rename them uniquely" if any molecules have duplicate single-letter chain IDs

Then hit OK. Back in the Model Panel, highlight just the combined model, click
"write PDB," select just the combined model, save relative to model ``protein.pdb``,
and name the file ``rec_noHC.pdb``. Check that the model coordinates and
labeleing numbering are correct in Chimera and a text editor. Keep only the
``ATOM`` lines with the Bash command

.. code-block:: bash

    grep ATOM rec_noHC.pdb > rec_prepped_noHC.pdb

Repeat the step of combining the models into one ``.pdb`` file for
``lig_stack_5.pdb``, ``lig_stack_sym.pdb``, and ``lig_stack_inv.pdb`` in
Chimera. Call the output ``lig_15_copies.pdb``.

We need to format the protein fibril and ligand stack for Blastermaster. I am
using Blastermaster here, and not Pydock3 or DockOpt, because the script I use
for modeling the low dielectric boundary of the ligand stack was made for the
output of Blastermaster. Feel free to update for using Pydock3/DockOpt.

The following steps are how I do the formatting for Blastermaster; some people
just plop the prepared structure into Blastermaster or DOCKOpt as ``rec.pdb``,
with no ``rec.crg.pdb``. There may be slight differences in the resulting grids.
For the way that I learned, we need to create the ``rec.pdb`` and
``rec.crg.pdb`` files.

Just to be safe, if running scripts on our cluster, make sure to source the
right path and environment. Don't worry about DOCK3.7 vs. DOCK3.8 here; we are
only using these paths for Blastermaster, which 3.8 did not change.

.. code-block:: python

    export DOCKBASE=/nfs/soft/dock/versions/dock37/DOCK-3.7-trunk
    source /nfs/soft/dock/versions/dock37/DOCK-3.7-trunk/env.sh

From ``rec_prepped_noHC.pdb``, make a copy called ``rec.crg.pdb`` that has each
histidine renamed to ``HID``, ``HIE``, or ``HIP``, depending on its protonation
state. On our cluster, you can run the command

.. code-block:: python

    python /mnt/nfs/home/ttummino/zzz.scripts/protein_prep/replace_his_with_hie_hid_hip.py rec_prepped_noHC.pdb rec.crg.pdb

To make the ``rec.pdb``, remove all remaining (polar) hydrogens and rename any
"CYM" residues back to "CYS". On our cluster, you can run the commands

.. code-block:: python

    python /mnt/nfs/home/ttummino/zzz.scripts/protein_prep/0000_remove_hydrogens_from_pdb.py rec_prepped_noHC.pdb
    # Output is rec_noH.pdb
    sed "s/CYM/CYS/" rec_noH.pdb > rec.pdb

We still need to format the ligand stack file so that Blastmaster knows to put
low dielectric thinspheres at the locations of the heavy atoms. We can do this
by just changing all the heavy atoms (assuming we removed all the hydrogens
before) to carbons and their residue types to "LPD" with this Python code:

.. code-block:: python

    import os #For reading files in directory

    dir = {PATH} #Equivalent on your computer
    file_name = "lig_15_copies.pdb"
    f = os.path.join(dir, file_name)

    file = open(f, "r")
    output = open("shell-LPD.pdb", "w")
    for line in file:
        if line[0:5] == 'ATOM ':
            new_line = 'HETATM' + line[6:12] + ' C  ' + ' LPD' + line[20:77] + 'C\n'
            output.write(new_line)
    file.close()
    output.close()

6. Modifying the ``INDOCK`` and Creating the ``affine_transformation.txt``
--------------------------------------------------------------------------

It's easiest to make the ``INDOCK`` and ``affine_transformation.txt`` before
running Blastermaster. There are multiple steps within Blastermaster that copy
previous ``working directories``, so if we put the right ``INDOCK`` and
``affine_transformation.txt`` into the ``working`` directory at the beginning,
they should get carried to the end.

The ``affine_transformation.txt`` gives the affine transformation matrix. Use
my example from docking EGCG for formatting. Here, the transformation is
rotating in the XY-plane and translating down the Z-axis.

.. code::

    rot_mat_row_1     1.000    0.019    0.000
    rot_mat_row_2    -0.019    1.000    0.000
    rot_mat_row_3     0.000    0.000    1.000
    transl_vec        0.000    0.000    4.800

Make sure the ``affine_transformation.txt`` is in the ultimate ``dockfiles``
directory.

These are the required changes to the ``INDOCK`` file, created along with the
initial DOCK grids with the Blastermaster call:

* Set ``minimize`` to ``no``
* Add a section before the ``end of INDOCK`` line:

.. code::

    #####################################################
    # SYMMETRY
    sym_affine_file               ../dockfiles/affine_transformation.txt
    sym_dist_thresh               2.0
    max_lig_lig_vdw               0.0
    scale_lig_lig_vdw             1.0

The ``sym_affine_file`` refers to the formatted ``affine_transformation.txt``
file described above. As long as you keep that file in the ``dockfiles``
directory and the ``INDOCK`` in the larger directory containing ``dockfiles``
(the custom when working on our cluster, but NOT when submitting on Wynton with
the ``super`` script), this relative path should be fine.

The ``sym_dist_thresh`` parameter refers to the interatomic distance in
Ångstroms below which two atoms in molecular symmetry mates are "clashing." The
``max_lig_lig_vdw`` parameter is the maximum value in DOCK "kcal/mol" energy
units for the ligand-ligand van der Waals energy to reach before NOT being added
to the total score. This energy is negative when there is no clash, so a maximum
of ``0.0`` means to always include this contribution to the total energy. The
``scale_lig_lig_vdw`` parameter is the scalar value to multiply the
ligand-ligand van der Waals energy by before adding to the total energy.

The example values here are the defaults.

7. Generating and Modifiying the DOCK Score Grids
-------------------------------------------------

With ``rec.pdb``, ``rec.crg.pdb``, ``xtal-lig.pdb``, and ``shell-LPD.pdb``, we
are now ready to run Blastermaster. I am writing this assuming you will do some
kind of optimization based on the thin sphere scan, so I will show you how to
run Blastermaster with different thin sphere radii, add the ligand low
dielectric region to each setup, and properly recombine the directories.

Begin by putting ``rec.pdb``, ``rec.crg.pdb``, ``xtal-lig.pdb``, and
``shell-LPD.pdb`` into the directory you are using on the cluster. Run the
export/source lines from the Section 5 (if not already done) and then these
Bash commands:

.. code-block:: bash

    mkdir working
    cp rec.pdb working
    cp rec.crg.pdb working
    cp xtal-lig.pdb working
    $DOCKBASE/proteins/blastermaster/blastermaster.py -v --addNOhydrogensflag

Next we run Step #1 of Reed's thin sphere scan, which runs in parallel on our
cluster. We won't need to run Step #2.

.. code-block:: bash

  mkdir ES-LD_Scan
  ORIG_PATH=$PWD
  cd ES-LD_Scan
  python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0001_generate_ES_LD_generation.py -p $ORIG_PATH
  unset ORIG_PATH #Just in case another script uses this variable

When these jobs finish on the cluster, return to the larger directory holding
``working`` and make new grids in ``with_Ligand_Stack``, with the ligand stack:

.. code-block:: bash

    cd .. #Exit working
    mkdir with_Ligand_Stack
    cp shell-LPD.pdb with_Ligand_Stack
    sh /mnt/nfs/exj/work/matt715/ex9_backup/matt715/Symmetry/Symmetry/New_Thinspheres/with_Ligand_Stack/blast-membrane-thinsph-scan.sh {PATH}/ES-LS_Scan {PATH_TO_LARGER_DIRECTORY_CONTAINING_working_AND_dockfiles}
    # Script written by Stefan and Andrii

Make sure the correct ``INDOCK`` and ``affine_transformation.txt`` are in the
new ``dockfiles`` directories in each new thin sphere directory. Then run your
chosen optimization scheme (logAUC enrichment of known binders over decoys,
RMSD in reproducing experimental poses, etc.), making sure to use the DOCK
executable equipped for SymDOCK. Once you have your optimized setup, dock to
that like you normally would (making sure to use the SymDOCK executable).

8. Analyzing Results with Symmetry
-------------------------------------------------

You can extract and filter the results from SymDOCK like normal. There will be
a new column in the ``OUTDOCK`` for successfully docked molecules for the
ligand-ligand van der Waals energy, called ``l-l_vdW`` and overwritten in column
13 (column 14 in the ``extract_all.sort.uniq.txt``). In the ``poses.mol2``, each
molecule will have that information stored under the ``Lig-Lig vdW`` flag. You
can use this as a column in the Chimera ViewDOCK feature.

If you want to make a stack of ligands for each ligand in your ``poses.mol2``
file, keeping all the commented information for each one, you can use this
Python script that Ian S. Knight wrote, called ``make_symmetry.py``. By default,
the script makes a stack with 7 new symmetry mates on each side of the original,
but you can change it using the ``num_applications=7`` and
``bidirectional=True`` parameters in the Pydock3 function
``write_mol2_file_with_molecules_cloned_and_transformed``.

.. code-block:: python

    from pydock3 import files
    import numpy as np
    import fire


    def main(affine_transformation_file_path, poses_mol2_file_path, trans_poses_mol2_file_path='trans_poses.mol2'):
        with open(affine_transformation_file_path, 'r') as f:
            rot_mat = []
            transl_vec = []
            for line in f.readlines():
                line.strip().split()
                row_label, x, y, z = line.split()
                row_data = [float(x), float(y), float(z)]
                if row_label.startswith("rot_mat"):
                    rot_mat.append(row_data)
                elif row_label == "transl_vec":
                    transl_vec = row_data
            rot_mat = np.array(rot_mat)
            transl_vec = np.array(transl_vec)
        mol2_file = files.Mol2File(poses_mol2_file_path)
        mol2_file.write_mol2_file_with_molecules_cloned_and_transformed(rot_mat, transl_vec, trans_poses_mol2_file_path, num_applications=7, bidirectional=True)

    if __name__ == '__main__':
        fire.Fire(main)

On our cluster, you can find an example of this script at
``/nfs/home/matt715/EGCG_Rescue/EGCG_3``. To run this script, with the
appropriate sourcing on our cluster, make sure you are on ``gimel5``, ``epyc``,
or another node that can handle the right Python version and Pydock3 (so NOT the
``gimel`` head node). Then run in Bash:

.. code-block:: bash

    source /mnt/nfs/soft/ian/python3.8.5.sh #For our cluster
    python make_symmetry.py affine_transformation.txt poses.mol2
    # Optional parameter: --trans_poses_mol2_file_path='trans_poses.mol2')

The output will be the ``trans_poses.mol2`` file, unless you decide to rename
it.