SymDOCK: Docking with Symmetry and Cooperativity
This guide is for using the symmetric extension to DOCK3.8 (SymDOCK) for Docking ligands against protein fibrils in a symmetric way and incorporating a crude ligand-ligand interaction energy. The same approach works for protein systems with other symmetry groups (e.g. cyclic groups \(Z_n\) for integer \(n\)), but I’ll keep the tutorial to the infinite fibril case (symmetry group of the integers \(\mathbb{Z}\) under addition).
Written by Matthew S. Smith. Please see the publication Matthew S. Smith, Ian S. Knight, Rian C. Kormos, Joseph G. Pepe, Peter Kunach, Marc I. Diamond, Sarah H. Shahmoradian, John J. Irwin, William F. DeGrado, and Brian K. Shoichet. “Docking for Molecules That Bind in a Symmetric Stack with SymDOCK.” Journal of Chemical Information and Modeling (2024)64(2):425-434. https://doi.org/10.1021/acs.jcim.3c01749.
1. What You Will Need
A cleaned up .pdb
file for the protein (which I have called
protein.pdb
), a .pdb
file for a single ligand in the region to dock
(which I have called xtal-lig.pdb
), a .pdb
file for the ligand stack
(which I have called lig_stack.pdb
), an INDOCK
file, a version of
DOCK3.8 that supports symmetry, and a programming language you are comfortable
using (I am using Python).
2. Background Math
You can safely ignore this section if you are not interested in the linear algebra.
We can represent the symmetry transformations of an atomic system (proteins, small molecules, etc.) by an affine transformation, which is composed of a linear transformation followed by a translation. For vectors in 3D (\(\mathbb{R}^3\)), we can still represent an affine transformation by a matrix by appending a “1” to the vector and using a \(4\times4\) matrix. As an example, consider the vector \((x,y,z)\). If we want to apply a linear transformation given by the \(3\times3\) matrix \(R\) to this vector and then follow with a translation by the vector \((a,b,c)\), we can do the multiplication
In our case, the matrix \(R\) is a rotation in 3 dimensions.
Note: The convention in algebra is to append the 1 at the end of the vector, putting the line of \((1,0,0,0)\) at the bottom of the matrix, not the top. I started this project using the convention in statistics of appending a 1 to the start of a data vector to allow for easy displacement by an intercept/bias in a linear model, and I have kept that convention here. The actual docking executable uses the rotation matrix and translation vector as separate inputs, so feel free to adjust the code in this tutorial to fit the algebra convention.
To find the affine transformation from one set of coordinates to another, select 4 vectors \((a_x,a_y,a_z)\), \((b_x,b_y,b_z)\), \((c_x,c_y,c_z)\), and \((d_x,d_y,d_z)\) in the first coordinate frame and their images \((a^\prime_x,a^\prime_y,a^\prime_z)\), \((b^\prime_x,b^\prime_y,b^\prime_z)\), \((c^\prime_x,c^\prime_y,c^\prime_z)\), and \((d^\prime_x,d^\prime_y,d^\prime_z)\) in the second frame, respectively. Then construct the matrices
and
If \(T\) is the affine transformation matrix going from the first set of coordinates to the second, then by simple matrix multiplication,
The matrix \(A\) has the same determinant as the matrix
(and all cyclic permutations of \(a\), \(b\), \(c\), \(d\)), and the determinant of this matrix is zero if and only if the 4 vectors are coplanar. Put another way, if we have chosen the points \((a_x,a_y,a_z)\), \((b_x,b_y,b_z)\), \((c_x,c_y,c_z)\), and \((d_x,d_y,d_z)\) to be non-coplanar, then \(A\) will be invertible, and we can find \(T\) as
3. Getting the Affine Transformation Matrix
For most protein systems, the PDB entry and publication will not include the affine transformation matrix in a neat, nice form. If the structure is big enough, we can estimate the transformation by following the symmetry of a few example points.
Here is some example code I wrote for an early version of the structure of AD PHF tau bound to GTP-1 (PDB ID 8FUG; Merz GE, Chalkley MH, Tan SK, Tse E, Lee J, Prusiner SB, Paras NA, DeGrado WF, and Southworth DR. “Stacked binding of a PET ligand to Alzheimer’s tau paired helical filaments.” Nature Communications (2023)14:3048. https://doi.org/10.1038/s41467-023-38537-y). I used many more example points than these, but I’ll keep this short (ish):
import os #For reading files in directory
import numpy as np #For NumPy functions
import statistics #For mean and stdev
np.set_printoptions(suppress=True) #To not print out in scientific notation
dir = {PATH} #Equivalent on your computer
file_name = "protein.pdb"
f = os.path.join(dir, file_name)
# A helper fumction to extract coordinates from a line
def extract_coord(x):
# x is a line of a PDB file
return(np.array([float(x[30:38]), float(x[38:46]), \
float(x[46:54])]))
# Make a function to extract the relevant coordinates and calculate the
# symmetry operation
def get_T(res_1, res_2, atom_1, atom_2, atom_3, atom_4):
# Each residue parameter should be a string of the form "RES A ###", for
# the residue type, chain name, and residue number (9 characters total)
# Each atom parameter should be a string of 4 characters properly aligned
# to the atom type in a PDB file in columns 13-16
# This function returns a 4x4 NumPy array (matrix)
file = open(f, "r")
for line in file:
if line[0:5] == 'ATOM ' and line[17:26] == res_1:
if line[12:16] == atom_1:
a = extract_coord(line)
if line[12:16] == atom_2:
b = extract_coord(line)
if line[12:16] == atom_3:
c = extract_coord(line)
if line[12:16] == atom_4:
d = extract_coord(line)
if line[0:5] == 'ATOM ' and line[17:26] == res_2:
if line[12:16] == atom_1:
a_prime = extract_coord(line)
if line[12:16] == atom_2:
b_prime = extract_coord(line)
if line[12:16] == atom_3:
c_prime = extract_coord(line)
if line[12:16] == atom_4:
d_prime = extract_coord(line)
file.close()
# Use Python list comprehension and NumPy to make the matrices
A_mat = np.vstack([ np.array([1.0 for i in range(1,5)]), \
np.array([ [v[i] for v in [a, b, c, d] ] for i in [0,1,2]]) ])
B_mat = np.vstack([ np.array([1.0 for i in range(1,5)]), \
np.array([ [v[i] for v in [a_prime, b_prime, c_prime, d_prime] ] for i in [0,1,2]]) ])
T_mat = np.dot(B_mat, np.linalg.inv(A_mat))
# Need to update the first row of T_mat
T_mat[0] = [1.0, 0., 0., 0.]
# Need to round to 3 decimal places, since that's the significant figures
# of the PDB file input
T_mat = np.around(T_mat, 3)
# Return the symmetry matrix T
return(T_mat)
# Start with atoms in Q351 in chains G and I:
# N, O, CB, and CD
Q351_GI = get_T('GLN G 351', 'GLN I 351', ' N ', ' O ', ' CB ', ' CD ')
# Next try with the same atoms but chains I to A
Q351_IA = get_T('GLN I 351', 'GLN A 351', ' N ', ' O ', ' CB ', ' CD ')
# Next try with the same atoms but chains A to E
Q351_AE = get_T('GLN A 351', 'GLN E 351', ' N ', ' O ', ' CB ', ' CD ')
# Next try with the same atoms but chains E to C
Q351_EC = get_T('GLN E 351', 'GLN C 351', ' N ', ' O ', ' CB ', ' CD ')
# Now try with atoms on ligand from chains P to O:
# F2, C21, C23, C32
lig_PO = get_T('UNK P 1', 'UNK O 1', ' F2 ', ' C21', ' C23', ' C32')
# Next try with the same atoms but ligands O to R
lig_OR = get_T('UNK O 1', 'UNK R 1', ' F2 ', ' C21', ' C23', ' C32')
# Next try with the same atoms but ligands R to Q
lig_RQ = get_T('UNK R 1', 'UNK Q 1', ' F2 ', ' C21', ' C23', ' C32')
# Example of printing out a full affine transformation matrix:
print(Q351_GI)
# Seems like rotation matrix is pretty consistent (not shown), using
# Small-Angle Approximation. But more variation in translation vector. Use
# Python list comprehension
print("Q351 Atoms N, O, CB, and CD from Chains G to I")
Q351_GI_trans = [ Q351_GI[i,0] for i in [1,2,3] ]
print(Q351_GI_trans)
print()
print("Q351 Atoms N, O, CB, and CD from Chains I to A")
Q351_IA_trans = [ Q351_IA[i,0] for i in [1,2,3] ]
print(Q351_IA_trans)
print()
print("Q351 Atoms N, O, CB, and CD from Chains A to E")
Q351_AE_trans = [ Q351_AE[i,0] for i in [1,2,3] ]
print(Q351_AE_trans)
print()
print("Q351 Atoms N, O, CB, and CD from Chains E to C")
Q351_EC_trans = [ Q351_EC[i,0] for i in [1,2,3] ]
print(Q351_EC_trans)
print()
print("Ligand Atoms F2, C21, C23, and C32 from Chains P to O")
lig_PO_trans = [ lig_PO[i,0] for i in [1,2,3] ]
print(lig_PO_trans)
print()
print("Ligand Atoms F2, C21, C23, and C32 from Chains O to R")
lig_OR_trans = [ lig_OR[i,0] for i in [1,2,3] ]
print(lig_OR_trans)
print()
print("Ligand Atoms F2, C21, C23, and C32 from Chains R to Q")
lig_RQ_trans = [ lig_RQ[i,0] for i in [1,2,3] ]
print(lig_RQ_trans)
print()
# Mean
print("Mean")
print("x")
print(statistics.mean( [v[0] for v in [Q351_GI_trans, Q351_IA_trans, \
Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
print("y")
print(statistics.mean( [v[1] for v in [Q351_GI_trans, Q351_IA_trans, \
Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
print("z")
print(statistics.mean( [v[2] for v in [Q351_GI_trans, Q351_IA_trans, \
Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
print()
# Sample standard deviation, with N-1 degrees of freedom
print("Standard Deviation")
print("x")
print(statistics.stdev( [v[0] for v in [Q351_GI_trans, Q351_IA_trans, \
Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
print("y")
print(statistics.stdev( [v[1] for v in [Q351_GI_trans, Q351_IA_trans, \
Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
print("z")
print(statistics.stdev( [v[2] for v in [Q351_GI_trans, Q351_IA_trans, \
Q351_AE_trans, Q351_EC_trans, lig_PO_trans, lig_OR_trans, lig_RQ_trans] ] ))
4. Extending the Protein Fibril and Ligand Stack
Now that we have the affine transformation matrix, we need to use it to extend the protein fibril and ligand stack. We need to extend the protein fibril so that the DOCK socre grids do not have edge effects. We will also need a stack of ligands that fills the full protein fibril to model the change in dielectric upon binding of a stack.
My example was a bit tricky, because the structure had 5 copies of the protein monomer but only 4 copies of the ligand. Your system might have weird exceptions like this, so I will include what I did.
Let’s start with just the protein fibril. The structure has 5 copies of the
protein monomer, so to easily make a system larger than the DOCK grid box, we
will just copy and plop the whole system on either side of the existing fibril
section. The symmetry operation we calculated earlier is just for going from
one monomer to the next. We need to apply it 5 times in either direction (as is
and inverted) in order to move the whole fibril section. T_mat
is the
symmetry matrix I settled on after the analysis above.
import os #For reading files in directory
import numpy as np #For NumPy functions
# Symmetry operator as an affine transformation (in SE(R^3))
T_mat = np.array([ [1., 0., 0., 0.], [-2.238, 1., 0.019, 0.], \
[2.266, -0.019, 1., 0], [4.747, 0., 0., 1.] ])
# Need to apply 5 times to get the movement of the entire 5-strand bundle
# down the fibril
T_full = np.linalg.matrix_power(T_mat, 5)
# Inverse symmetry operator
T_inv = np.array([ [1., 0., 0., 0.], [2.280, 1., -0.019, 0.], \
[-2.223, 0.019, 1., 0], [-4.747, 0., 0., 1.] ])
T_inv_full = np.linalg.matrix_power(T_inv, 5)
dir = {PATH} #Equivalent on your computer
file_name = "protein.pdb"
f = os.path.join(dir, file_name)
# A helper fumction to extract coordinates from a line
def extract_coord(x):
# x is a line of a PDB file
return(np.array([float(x[30:38]), float(x[38:46]), \
float(x[46:54])]))
# A function that applies the symmetry operation in the form of an affine
# transformation
def apply_T(T, v):
# The parameter v is a numeric array. We need to first add an initial
# entry of 1 to the vector
vec = np.array( [1., v[0], v[1], v[2]] )
out_raw = np.dot(T, vec)
out = [out_raw[i] for i in [1,2,3]] #Do not need the leading 1
return(out)
# Loop over all the lines in the input file and update only the atomic
# coordinates, making sure to keep the line spacing the same. First do with
# symmetry moving forward down axis
file = open(f, "r")
output = open("protein_sym.pdb", "w")
for line in file:
if line[0:5] == 'ATOM ':
coord = extract_coord(line)
new_coord = apply_T(T_full, coord)
# Need to do correct string formatting: 8 characters total, with enough
# leading whitespace and trailing zeroes to keep 3 decimal places
# This allows for possible negative signs
x_new = '{:.3f}'.format(round(new_coord[0], 3)).rjust(8)
y_new = '{:.3f}'.format(round(new_coord[1], 3)).rjust(8)
z_new = '{:.3f}'.format(round(new_coord[2], 3)).rjust(8)
new_line = line[0:30] + x_new + y_new + z_new + line[54:78] + "\n"
output.write(new_line)
elif line[0:3] == 'TER' or line[0:3] == 'END':
output.write(line) #Keep these linese as is
file.close()
output.close
# Repeat with symmetry moving backward down axis (inverse). Read in file again
# to start at top
file_2 = open(f, "r")
output_2 = open("protein_inv.pdb", "w")
for line in file_2:
if line[0:5] == 'ATOM ':
coord = extract_coord(line)
new_coord = apply_T(T_inv_full, coord)
# Need to do correct string formatting: 8 characters total, with enough
# leading whitespace and trailing zeroes to keep 3 decimal places
# This allows for possible negative signs
x_new = '{:.3f}'.format(round(new_coord[0], 3)).rjust(8)
y_new = '{:.3f}'.format(round(new_coord[1], 3)).rjust(8)
z_new = '{:.3f}'.format(round(new_coord[2], 3)).rjust(8)
new_line = line[0:30] + x_new + y_new + z_new + line[54:78] + "\n"
output_2.write(new_line)
elif line[0:3] == 'TER' or line[0:3] == 'END':
output_2.write(line) #Keep these lines as is
file_2.close()
output_2.close
In my example, there were only 4 copies of the ligand in the cryo-EM structure.
I first saved a .pdb
file for one ligand on the end of the stack next to the
empty site on the fibril (lig_1.pdb
). Then I applied the symmetry operation
once on that single ligand to make a new end of the stack (lig_0.pdb
). Then
I saved the new stack of 5 as a single file (lig_stack_5.pdb
) to make a
stack of 5 ligands. Then I applied the symmetry operation 5 times on the stack
of 5 in either direction to make lig_stack_sym.pdb
and
lig_stack_inv.pdb
.
5. Preparing for Blastermaster
Next we need to make one .pdb
file for the protein and one for the ligand
stack with all these atoms, the proper line numbering and labeling, and no
nonpolar hydrogens. To do this, open protein.pdb
, protein_sym.pdb
,
protein_inv.pdb
, lig_stack_5.pdb
, lig_stack_sym.pdb
, and
lig_stack_inv.pdb
in Chimera. Delete all the nonpolar hydrogens with the
commands sel HC
followed by del sel
. Select the 3 protein models in the
Model Panel and click “copy/combine molecular models.” Use these options:
Make sure all 3 are selected (
protein.pdb
,protein_sym.pdb
, andprotein_inv.pdb
)Name the new model whatever you want
Give the new model a new ID
Use the coordinate system of the original
protein.pdb
Select the option to “rename them uniquely” if any molecules have duplicate single-letter chain IDs
Then hit OK. Back in the Model Panel, highlight just the combined model, click
“write PDB,” select just the combined model, save relative to model protein.pdb
,
and name the file rec_noHC.pdb
. Check that the model coordinates and
labeleing numbering are correct in Chimera and a text editor. Keep only the
ATOM
lines with the Bash command
grep ATOM rec_noHC.pdb > rec_prepped_noHC.pdb
Repeat the step of combining the models into one .pdb
file for
lig_stack_5.pdb
, lig_stack_sym.pdb
, and lig_stack_inv.pdb
in
Chimera. Call the output lig_15_copies.pdb
.
We need to format the protein fibril and ligand stack for Blastermaster. I am using Blastermaster here, and not Pydock3 or DockOpt, because the script I use for modeling the low dielectric boundary of the ligand stack was made for the output of Blastermaster. Feel free to update for using Pydock3/DockOpt.
The following steps are how I do the formatting for Blastermaster; some people
just plop the prepared structure into Blastermaster or DOCKOpt as rec.pdb
,
with no rec.crg.pdb
. There may be slight differences in the resulting grids.
For the way that I learned, we need to create the rec.pdb
and
rec.crg.pdb
files.
Just to be safe, if running scripts on our cluster, make sure to source the right path and environment. Don’t worry about DOCK3.7 vs. DOCK3.8 here; we are only using these paths for Blastermaster, which 3.8 did not change.
export DOCKBASE=/nfs/soft/dock/versions/dock37/DOCK-3.7-trunk
source /nfs/soft/dock/versions/dock37/DOCK-3.7-trunk/env.sh
From rec_prepped_noHC.pdb
, make a copy called rec.crg.pdb
that has each
histidine renamed to HID
, HIE
, or HIP
, depending on its protonation
state. On our cluster, you can run the command
python /mnt/nfs/home/ttummino/zzz.scripts/protein_prep/replace_his_with_hie_hid_hip.py rec_prepped_noHC.pdb rec.crg.pdb
To make the rec.pdb
, remove all remaining (polar) hydrogens and rename any
“CYM” residues back to “CYS”. On our cluster, you can run the commands
python /mnt/nfs/home/ttummino/zzz.scripts/protein_prep/0000_remove_hydrogens_from_pdb.py rec_prepped_noHC.pdb
# Output is rec_noH.pdb
sed "s/CYM/CYS/" rec_noH.pdb > rec.pdb
We still need to format the ligand stack file so that Blastmaster knows to put low dielectric thinspheres at the locations of the heavy atoms. We can do this by just changing all the heavy atoms (assuming we removed all the hydrogens before) to carbons and their residue types to “LPD” with this Python code:
import os #For reading files in directory
dir = {PATH} #Equivalent on your computer
file_name = "lig_15_copies.pdb"
f = os.path.join(dir, file_name)
file = open(f, "r")
output = open("shell-LPD.pdb", "w")
for line in file:
if line[0:5] == 'ATOM ':
new_line = 'HETATM' + line[6:12] + ' C ' + ' LPD' + line[20:77] + 'C\n'
output.write(new_line)
file.close()
output.close()
6. Modifying the INDOCK
and Creating the affine_transformation.txt
It’s easiest to make the INDOCK
and affine_transformation.txt
before
running Blastermaster. There are multiple steps within Blastermaster that copy
previous working directories
, so if we put the right INDOCK
and
affine_transformation.txt
into the working
directory at the beginning,
they should get carried to the end.
The affine_transformation.txt
gives the affine transformation matrix. Use
my example from docking EGCG for formatting. Here, the transformation is
rotating in the XY-plane and translating down the Z-axis.
rot_mat_row_1 1.000 0.019 0.000
rot_mat_row_2 -0.019 1.000 0.000
rot_mat_row_3 0.000 0.000 1.000
transl_vec 0.000 0.000 4.800
Make sure the affine_transformation.txt
is in the ultimate dockfiles
directory.
These are the required changes to the INDOCK
file, created along with the
initial DOCK grids with the Blastermaster call:
Set
minimize
tono
Add a section before the
end of INDOCK
line:
#####################################################
# SYMMETRY
sym_affine_file ../dockfiles/affine_transformation.txt
sym_dist_thresh 2.0
max_lig_lig_vdw 0.0
scale_lig_lig_vdw 1.0
The sym_affine_file
refers to the formatted affine_transformation.txt
file described above. As long as you keep that file in the dockfiles
directory and the INDOCK
in the larger directory containing dockfiles
(the custom when working on our cluster, but NOT when submitting on Wynton with
the super
script), this relative path should be fine.
The sym_dist_thresh
parameter refers to the interatomic distance in
Ångstroms below which two atoms in molecular symmetry mates are “clashing.” The
max_lig_lig_vdw
parameter is the maximum value in DOCK “kcal/mol” energy
units for the ligand-ligand van der Waals energy to reach before NOT being added
to the total score. This energy is negative when there is no clash, so a maximum
of 0.0
means to always include this contribution to the total energy. The
scale_lig_lig_vdw
parameter is the scalar value to multiply the
ligand-ligand van der Waals energy by before adding to the total energy.
The example values here are the defaults.
7. Generating and Modifiying the DOCK Score Grids
With rec.pdb
, rec.crg.pdb
, xtal-lig.pdb
, and shell-LPD.pdb
, we
are now ready to run Blastermaster. I am writing this assuming you will do some
kind of optimization based on the thin sphere scan, so I will show you how to
run Blastermaster with different thin sphere radii, add the ligand low
dielectric region to each setup, and properly recombine the directories.
Begin by putting rec.pdb
, rec.crg.pdb
, xtal-lig.pdb
, and
shell-LPD.pdb
into the directory you are using on the cluster. Run the
export/source lines from the Section 5 (if not already done) and then these
Bash commands:
mkdir working
cp rec.pdb working
cp rec.crg.pdb working
cp xtal-lig.pdb working
$DOCKBASE/proteins/blastermaster/blastermaster.py -v --addNOhydrogensflag
Next we run Step #1 of Reed’s thin sphere scan, which runs in parallel on our cluster. We won’t need to run Step #2.
mkdir ES-LD_Scan
ORIG_PATH=$PWD
cd ES-LD_Scan
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0001_generate_ES_LD_generation.py -p $ORIG_PATH
unset ORIG_PATH #Just in case another script uses this variable
When these jobs finish on the cluster, return to the larger directory holding
working
and make new grids in with_Ligand_Stack
, with the ligand stack:
cd .. #Exit working
mkdir with_Ligand_Stack
cp shell-LPD.pdb with_Ligand_Stack
sh /mnt/nfs/exj/work/matt715/ex9_backup/matt715/Symmetry/Symmetry/New_Thinspheres/with_Ligand_Stack/blast-membrane-thinsph-scan.sh {PATH}/ES-LS_Scan {PATH_TO_LARGER_DIRECTORY_CONTAINING_working_AND_dockfiles}
# Script written by Stefan and Andrii
Make sure the correct INDOCK
and affine_transformation.txt
are in the
new dockfiles
directories in each new thin sphere directory. Then run your
chosen optimization scheme (logAUC enrichment of known binders over decoys,
RMSD in reproducing experimental poses, etc.), making sure to use the DOCK
executable equipped for SymDOCK. Once you have your optimized setup, dock to
that like you normally would (making sure to use the SymDOCK executable).
8. Analyzing Results with Symmetry
You can extract and filter the results from SymDOCK like normal. There will be
a new column in the OUTDOCK
for successfully docked molecules for the
ligand-ligand van der Waals energy, called l-l_vdW
and overwritten in column
13 (column 14 in the extract_all.sort.uniq.txt
). In the poses.mol2
, each
molecule will have that information stored under the Lig-Lig vdW
flag. You
can use this as a column in the Chimera ViewDOCK feature.
If you want to make a stack of ligands for each ligand in your poses.mol2
file, keeping all the commented information for each one, you can use this
Python script that Ian S. Knight wrote, called make_symmetry.py
. By default,
the script makes a stack with 7 new symmetry mates on each side of the original,
but you can change it using the num_applications=7
and
bidirectional=True
parameters in the Pydock3 function
write_mol2_file_with_molecules_cloned_and_transformed
.
from pydock3 import files
import numpy as np
import fire
def main(affine_transformation_file_path, poses_mol2_file_path, trans_poses_mol2_file_path='trans_poses.mol2'):
with open(affine_transformation_file_path, 'r') as f:
rot_mat = []
transl_vec = []
for line in f.readlines():
line.strip().split()
row_label, x, y, z = line.split()
row_data = [float(x), float(y), float(z)]
if row_label.startswith("rot_mat"):
rot_mat.append(row_data)
elif row_label == "transl_vec":
transl_vec = row_data
rot_mat = np.array(rot_mat)
transl_vec = np.array(transl_vec)
mol2_file = files.Mol2File(poses_mol2_file_path)
mol2_file.write_mol2_file_with_molecules_cloned_and_transformed(rot_mat, transl_vec, trans_poses_mol2_file_path, num_applications=7, bidirectional=True)
if __name__ == '__main__':
fire.Fire(main)
On our cluster, you can find an example of this script at
/nfs/home/matt715/EGCG_Rescue/EGCG_3
. To run this script, with the
appropriate sourcing on our cluster, make sure you are on gimel5
, epyc
,
or another node that can handle the right Python version and Pydock3 (so NOT the
gimel
head node). Then run in Bash:
source /mnt/nfs/soft/ian/python3.8.5.sh #For our cluster
python make_symmetry.py affine_transformation.txt poses.mol2
# Optional parameter: --trans_poses_mol2_file_path='trans_poses.mol2')
The output will be the trans_poses.mol2
file, unless you decide to rename
it.