Ligand Preparation (Building)

Building ligands is the process of preparing them in the db2 file format for use in the DOCK software. It consists of generating protonation states, tautomers, conformations, and calculating partial charges, atomic desolvation, and strain.

The commands for creating building jobs on wynton and gimel are below. All submission options are listed at the bottom.

Job Setup on Wynton

Source the environment

source /wynton/group/bks/soft/DOCK-3.8.5/env.sh

Create the building jobs

python $DOCK_INSTALL_PATH/zinc22-3d/submit/submit_building_docker.py --output_folder [output_folder_name] --bundle_size [bundle_size] --skip_name_check --scheduler sge --container_software apptainer --container_path_or_name $DOCK_INSTALL_PATH/building_pipeline.sif [smi file name]

Submit the building jobs

qsub building_array_job.sh

Job Setup on Gimel

If this is your first time building on gimel, first ask John and his team to add you to the “docker” permissions group.

Run this on a SLURM node (epyc, epyc2, gimel2, etc).

Source the environment

source /mnt/nfs/soft/dock/versions/dock385/env.sh

Create Building Jobs

python $DOCK_INSTALL_PATH/zinc22-3d/submit/submit_building_docker.py --output_folder [output_folder_name] --bundle_size [bundle_size] --skip_name_check [smi file]

Submit the building jobs

sbatch --exclude=$(paste -sd, /mnt/nfs/exk/work/bwhall61/deploy_building_pipeline_docker/broken_nodes.txt) building_array_job.sh

Understanding What’s Happening

When you run the python script, your smile file will be separated into smaller units called “bundles”. Each bundle of ligands will be built by an individual job submitted to the scheduler.

The python script will create an output folder with the name you provide. Inside the output folder will be subfolders called 1,2,…N where each subfolder is an individual bundle. Inside these folders is where your built molecules will be.

Options

–output_folder: The name of the output folder to store results.

–bundle_size: The number of molecules to include in a single bundle. You will end up with [total_num_molecules]/[bundle_size] output bundles. Note: If you are building lots of molecules, try and have no more than a few thousand bundles to limit the number of subfolders in the output folder. You may need to break up your input smile into smaller chunks and run building jobs for each one

–minutes_per_mol: The number of minutes to allow for building each molecule. An individual bundle’s job will run for max [minutes_per_mol]*[bundle_size] minutes.

–skip_name_check: If you know that you have unique moleucle names in your input smile file, you should use this option.

–scheduler: sge or slurm. Defaults to slurm.

–container_software: docker or apptainer. Defaults to docker.

–container_path_or_name: If docker is used, this is the name of the docker image. If apptainer is used, this is the path to the apptainer image. Default is building_pipeline (the docker image name on gimel)