Protein Preparation 
=========================

We typically use `Maestro <https://www.schrodinger.com/platform/products/maestro/>`_ from Schrodinger to do our
protein receptor preparation.

The main idea is to modify/fix any issues with the starting structure and to add hydrogens.


Typical Workflow
----------------

Maestro Installation
~~~~~~~~~~~~~~~~~~~~

TODO: Need a login to download - what is our login or how do we get Maestro? Would try and ask JJ or Khanh.


Minimization with Maestro
~~~~~~~~~~~~~~~~~~~~~~~~~

Note that this is the default pipeline. You should think about your specific protein and whether some of the options should be changed.

#. File > import structures > protein.pdb

#. Open Protein Prep Wizard located in the top left ("Protein Preparation")

#. Preprocess

   * Check "Cap termini"
  
   * Check "Fill in missing side chains"
  
   * Default "More Options" 

#. Check that the preprocessed structure looks ok, paying specific attention to missing side chains and loops that may have been added. Particularly if they are near the binding site.

#. Optimize H-bond Assignments

   * To use the default Maestro assignments, just click "Optimize"
   * If you want to specify HIS protonation states/flips, click "Assign with Constraints". Then choose your desired state for a residue by using the arrows. If you want to ensure that Maestro does not change this state, check the "Lock". When you are done setting states, click "Optimize".
   * Check the protonation states of important residues to make sure that everything looks good.
  
#. Minimize and Delete Waters 

   * Settings

     * Our default is to check "Optimize hydrogens only"
     * Default is to also delete all of the waters, although you may want to keep some depending on your receptor.
   
   * After changing settings just click "Clean Up" 

#. Right click minimized structure > export > structure > ``rec_and_xtal_minimized.pdb``

Cleaning Up Minimized Structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After arriving at a minimized structure, we need to do a couple more things to prepare it for our DOCK preparation software.

#. Open ``rec_and_xtal_minimized.pdb`` in Chimera.

#. In the command line run the following:

   * ``split #0:ligand`` to split the PDB model into separate protein and ligand models.

   * ``del HC`` to delete hydrocarbons from the protein model.

#. Check that the termini are built right, and that you like the protonation states of charged residues and positions of ASNs and GLNs. This is mostly important around the binding site.

#. Save the receptor as ``rec_noHC.pdb`` and the ligand as ``xtal-lig.pdb``

#. Open ``xtal-lig.pdb`` in a text editor and delete the header and CONECT lines at the bottom. 

#. Open ``rec_noHC.pdb`` in a text editor and do the following:

   * Delete the header and CONECT lines at the bottom.

   * Change capped termini (ACE/NMA) from HETATM to ATOM.
  
     * For NMA change the CA atom type to CM
  
   * Change the backbone amide hydrogen atom type from H1/H2 to H after the capped N-terminus.
  
   * Delete any unwanted ions or waters.
  
     * For any waters you want to keep, change HETATM to ATOM and the atom types from H1/H2 to H01/H02. Make sure the water residue is named HOH.
  
   * Make sure atom numbering and residue numbering is correct 

     * Make sure that the ACE/NMA cap residue numbering doesn't include any letters. Ex. Change residue 521A to just 521 (or 522 if the previous residue is 521).

#. Generate ``rec.crg.pdb`` with this command: ``python2 /mnt/nfs/home/ttummino/zzz.scripts/protein_prep/replace_his_with_hie_hid_hip.py rec_noHC.pdb rec.crg.pdb``

   * If a cysteine bridge exists, change CYS to CYX (unclear if this means before or after running the script).
   * Check that carbon atom type for NMA is still CM after running.

#. Generate ``rec.pdb`` with this command: ``python2 /mnt/nfs/home/ttummino/zzz.scripts/protein_prep/0000_remove_hydrogens_from_pdb.py rec_noHC.pdb``

   * The output is ``rec_noH.pdb``. Just rename this to ``rec.pdb``.
   * Again check that the carbon atom type for NMA is still CM. 

Additional Things To Consider
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Are there any mutations in your structure? Consider mutating these back to WT. 

* Are there multiple conformations for any residues? Make sure to reduce structure to a single conformation.

* There should be no HETATM records in rec.pdb/rec.crg.pdb. 

* If you have two chains (say B,C) with the independent residue numbering, blastermaster will gladly rename the second chain for you to the next letter in the alphabet and then fail with /“file working/rec.ms is empty, check input files”/. To avoid this, name both chains with the same first letter (say B) in rec.pdb and leave proper numbering in rec.crg.pdb (B,C). The culprit is the function fixChainIds() in pdb.py 
   
   * Brendan doesn't know what this means and will do some experimenting


Advanced Uses
-------------

Protein with lipid membrane
~~~~~~~~~~~~~~~~~~~~~~~~~~~

See this `wiki article <https://wiki.docking.org/index.php/Membrane_Modeling#Membrane_modelling_in_Schrodinger>`_


.. Induced Fit Docking in Maestro
.. ------------------------------

.. * Prepare receptor with “Protein preparation wizard” 

.. * Prepare ligand structure with “LigPrep” and save it as a .mae file 

.. * Open “Induced Fit Docking” window from “Tasks” 

.. * Ligands to be docked – File – Mae file of ligand 
.. * Receptor tab – Box center: – select as needed 
.. * Write the project, copy to Gimel, 
.. * ssh gimel5 
.. * ``export SCHRODINGER=“/nfs/soft2/schrodinger/2021-2/”``
.. * Edit InducedFit_X.sh to smth. like this ``“${SCHRODINGER}/ifd” -NGLIDECPU 16 -NPRIMECPU 16 InducedFit_5.inp -NOLOCAL -HOST gimel5.gpu -SUBHOST gimel5.gpu -TMPLAUNCHDIR`` it runs on CPU actually, but it’s faster to run on gpu queue:)