Large-Scale Docking (LSD)

Large-Scale Docking (LSD) is molecular docking applied to a large database of molecules, typically over one billion.

Here we outline the steps to prepare and execute an LSD campaign. This is written for Wynton. A separate document covers AWS. The Wynton version can be generalized to work on any cluster.

Selecting Compounds for Docking from ZINC

ZINC-22 is organized in files of 5000 or fewer molecules per file. This reduces the number of individual files in the operating system, and corresponds approximately to one hour of compute time on a single core(thread).

DOCK 3.8 reads SDI or split_database_index files that contain the full path to the files to be docked. We have prepared SDI files in advance, organized in ways that we hope you find intuitive.

The four ways are: * sets (/nfs/exd/sets/ or /wynton/group/bks/sets)

  • dirs (/nfs/exd/sets/dirs/ or /wynton/group/bks/sets/dirs)

  • 3D tranche browser from Cartblanche22.docking.org

  • use unix find or otherwise to custom build your own from scratch.

We begin with our top recommendation: sets. Please refer to https://wiki.docking.org/index.php/Selecting_tranches_in_ZINC22

Sets are of the form <charge>-<Hbin>.<set-name>.<format> where
* charge is  N=0, O=+1, P=+2, M=-1, L=-2, etc from J to R.
* <Hbin> is H04 to H49, the heavy atom count
* <set-name> is one of lead-like, frag-like, greasy-leads, big-greasy, big.
* <format> is one of s3 (AWS), wyn (Wynton) or txt (our cluster)

Thus if you want lead-like H17-25, logP<3.5, cations only, for Wynton, you use

cat /wynton/group/bks/sets/O-H??.lead-like.wyn  > sdi_files.sdi

If you want a little more granularity, say you want more control over exactly which H?? you use, or say you want just the in stock molecules, or just molecules that have been built in the last month, use sets:

Each file has a name of the form <charge>-<logPbin>-<Hbin>.<layer>.<suffix> where:
* suffix is one of s3 (AWS), wyn (Wynton) or txt (our cluster)
* Layer is a-z see https://wiki.docking.org/index.php/ZINC22:Layers
** e.g. y is last few months, and a and g are the in stock layers.
* Hbin is H04 to H49, the heavy atom count
* charge is N=0, O=+1, M=-1, etc from J to R.
* logPbin is one of:
**M = LogP < 0
**P012 = LogP 0-2
**P304 = logP 3.0 to 3.49
**P359 = LogP 3.5 to 3.99
**P4 = LogP 4.0 to 4.99
**P56789 = logP > 5.0

So now, if you want medium to large leads (H20-25), anionic -1 or -2, but only from the last few months, for AWS, use:

cat /wynton/group/bks/dirs/y/[LM]-M-H2[0-5].y.wyn /wynton/group/bks/dirs/y/[LM]-P012-H2[0-5].y.wyn /wynton/group/bks/dirs/y/[LM]-P304-H2[0-5].y.wyn > sdi_files.sdi

Method 3 is use the GUI of Cartblanche22.docking.org, and download the SDI for exactly what you need. It should be obvious how to do this, and we will create a video. If you have trouble with this, please put a note here AND ask me and we will write something, or make the software easier to use, or both.

Submitting Jobs with the “Super” Script

(Note, this super script uses a slower dock3r,gfortran binary - I will fix this when the new release has been tested and installed more)

The “super” script is designed to automate the submission of docking jobs to wynton.

You need to provide just a few things and then run a single script:

export MOLECULES_DIR_TO_BIND=[outermost folder containing the molecules to dock]
export DOCKFILES=[path to your dockfiles]
export INPUT_FOLDER=[the folder containing your .sdi file(s)]
export OUTPUT_FOLDER=[where you want the output ]

/wynton/group/bks/work/bwhall61/needs_github/super_dock3r.sh

Note that the paths need to be absolute!

Checking for and Resubmitting Failed/Error Jobs

Still needs to be written. See “Wynton ZINC22 Elissa” for now.

If I had to guess - not many jobs will fail.