Running ChemSTEP (Auto DOCK and Build)

Last updated: May 07 2026 — current version: v03 (integrated IFP)

ChemSTEP is configured to run on Wynton with libraries of 13B, 22B (HAC 17-26, cLogP <= 3.5), and 72B (HAC 4-49, cLogP <= 4.0). Below are instructions for running ChemSTEP with automatic submission of docking and building.

  1. Source Environment

    source /wynton/group/bks/work/shared/kholland/chemstep_auto_v03_ifp/v03_ifp/bin/activate
    
  2. Dock the Seed Set

    Copy the .sdi file for the library you want to use:

    Library

    Path

    13B

    /wynton/group/bks/work/shared/kholland/chemstep_auto_v03_ifp/v03_ifp/scripts/libraries/13B/13M_seeds.wynton.sdi

    22B

    /wynton/group/bks/work/shared/kholland/chemstep_auto_v03_ifp/v03_ifp/scripts/libraries/22B/22M_seeds.sdi

    72B

    /wynton/group/bks/work/shared/kholland/chemstep_auto_v03_ifp/v03_ifp/scripts/libraries/72B/72M_seeds.sdi

    Then, DOCK the seed set. See Large-Scale Docking (LSD) directions.

  3. Gather Scores for the Seed Set

    Once docking is complete, run the following from the directory one level above your docking output (MOLECULES_DIR_TO_BIND).

    For 22B/72B library:

    python /wynton/group/bks/work/shared/kholland/chemstep_auto_v03_ifp/v03_ifp/scripts/get_scores.py 0
    

    For the 13B library:

    python /wynton/group/bks/work/shared/kholland/chemstep_auto_v03_ifp/v03_ifp/scripts/get_scores.py 0 MOL
    

    Note: You must specify the molecule ID prefix for the 13B library (MOL).

    Verify that scores_round_0.txt was correctly written:

    wc -l scores_round_0.txt
    
  4. Convert Scores to .npy Files

    Convert scores to ChemSTEP-readable .npy files:

    python /wynton/group/bks/work/shared/kholland/chemstep_auto_v03_ifp/v03_ifp/scripts/convert_scores_to_npy.py 0 <mol_id_prefix>
    

    The mol_id_prefix should match the library:

    Library

    Prefix

    22B / 72B

    CSLB

    13B

    MOL

  5. Set Up the ChemSTEP Run Directory

    Make a directory to run ChemSTEP in, cd into it, and copy in necessary files.

    mkdir chemstep_run
    cd chemstep_run
    chemstep-run-new
    

    This will populate the directory with params.txt, run_chemstep.py, and launch_chemstep_as_job.sh.

    If running with integrated IFP for beacon selection, also run:

    chemstep-run-ifp
    

    This copies in the additional files ifp_acceptance_criteria.txt and interactions.txt.

  6. Edit params.txt

    Add the absolute paths to the ChemSTEP-readable score and indices numpy arrays generated in Step 4. The remaining values are left to the user’s discretion, with considerations below.

    seed_indices_file:  /path/to/your/indices_round_0.npy
    seed_scores_file:   /path/to/your/scores_round_0.npy
    hit_pprop:          5.5
    n_docked_per_round: 2000000
    bundle_size:        1000
    max_beacons:        100
    max_n_rounds:       250
    

    Parameter

    Description

    hit_pprop

    Defines a “virtual hit.” pProp = -log(rank%) within the total library score distribution. E.g., pProp 4 in 13B space ≈ top 0.01% (~1.3M molecules); pProp 5 ≈ 0.001% (~132K). The seed set should contain at least 10(pProp+2) molecules.

    n_docked_per_round

    Number of molecules prioritized per round. All must be built and docked between rounds. Too many slows throughput and may reduce diversity; too few slows virtual hit recovery. Recommended: 1-2 million.

    max_beacons

    Diverse, well-scoring molecules used to guide prioritization. All molecules above the pProp threshold are candidates. Too many reduces inter-beacon diversity; too few hinders space exploration. Fewer beacons than specified may be assigned if insufficient molecules clear the threshold. Recommended: 100.

    bundle_size

    In auto docking mode, number of molecules submitted as a single build job.

    max_n_rounds

    No adjustment needed when running ChemSTEP prospectively as described here.

  7. Edit run_chemstep.py

    Note: All paths must be absolute paths.

    Set lib_path to the library pickle for your library:

    Library

    Path

    13B

    /wynton/group/bks/work/shared/kholland/chemstep_auto_v03_ifp/v03_ifp/scripts/libraries/13B/boltz_fplib.pickle

    22B

    /wynton/group/bks/work/shared/kholland/chemstep_auto_v03_ifp/v03_ifp/scripts/libraries/22B/22B_fplib.pickle

    72B

    /wynton/group/bks/work/shared/kholland/chemstep_auto_v03_ifp/v03_ifp/scripts/libraries/72B/72B_fplib.pickle

    lib_path = '/full/path/to/library.pickle'
    

    Set dockfiles_path:

    dockfiles_path="/full/path/to/dockfiles"
    

    Optional: minTD Exclusion Zone — molecules will not be prioritized from within a specified Tanimoto distance of beacons. Comment in the relevant lines and update the value. Consider also setting enforce_n_docked_per_round = True when using this option:

    min_td_search=0.5,
    enforce_n_docked_per_round=True,
    

    Optional: Integrated IFP — only selects beacons that satisfy user-defined interaction criteria. Comment in the following lines and update the paths to the necessary files (copied in Step 5 if you ran chemstep-run-ifp):

    use_IFP=True,
    ifp_pdb_path='/full/path/to/rec.crg.pdb',
    interactions_file='/full/path/to/interactions.txt',
    ifp_acceptance_criteria_file='/full/path/to/ifp_acceptance_criteria.txt',
    

    interactions.txt — one interaction per line, comma-separated. Format: interaction_type, residue_name_and_number. Example:

    Hydrogen bond, GLY19
    Ionic, ASP149
    

    Supported interaction types include: Proximal, Hydrogen bond, Ionic, Cation-pi, Hydrophobic, Halogen bond, and others. See LUNA and IFP documentation for the full list.

    ifp_acceptance_criteria.txt — defines the number of unsatisfied donors/acceptors/specific interactions required for a molecule to pass IFP and be considered for beacon selection. Example:

    #_donors
    #_acceptors
    #_unstatisfied_donors == 0
    #_unstatisfied_acceptors <= 4
    Ionic/ASP-149 > 0
    

    Example: AmpC on 22B with minTD=0.50, No IFP

    lib_path = '/wynton/group/bks/work/shared/kholland/chemstep_auto_v02/scripts/libraries/22B/22B_fplib.pickle'
    lib = load_library_from_pickle(lib_path)
    algo = CSAlgo(lib, 'params.txt', 'output', 16, verbose=True,
        scheduler='sge', smi_id_prefix='CSLB',
        python_exec="/wynton/group/bks/work/shared/kholland/chemstep_auto_v02/bin/python",
        dockfiles_path="/wynton/group/bks/work/kholland/chemstep_ampc_22B/seed_docking/dockfiles",
        min_td_search=0.5,
        enforce_n_docked_per_round=True,
        #use_IFP=True,
        #ifp_pdb_path='/path/to/your/reference/rec.crg.pdb',
        #interactions_file='/path/to/your/interactions.txt',
        #ifp_acceptance_criteria_file='/path/to/your/ifp_acceptance_criteria.txt',
        docking_method="auto", track_beacon_orig=True)
    
  8. Launch the Job

    Submit the main ChemSTEP job:

    qsub launch_chemstep_as_job.sh
    
  9. Monitor Job Status

    Check job status with qstat. The main job will run for up to 2 weeks given no errors. ChemSTEP will launch search, building, and docking jobs in successive rounds.

    Note: If any building or docking subjobs hang, the main job will not proceed until those are canceled or finished. Keep an eye on job statuses regularly. Occasionally check that docking output files (scores_round_*.txt) are being populated.

  10. View Beacon SMILES and IDs

    From the ChemSTEP running directory, run the following in a screen session on a dev node:

    python /wynton/group/bks/work/shared/kholland/chemstep_auto_v03_ifp/v03_ifp/scripts/get_beacon_smiles.py /path/to/library/pickle chemstep_algo.log
    

    Use the library pickle path from Step 7.

  11. Plot Hit Recovery

    From your ChemSTEP directory containing chemstep_algo.log and score files:

    bash /wynton/group/bks/work/shared/kholland/chemstep_auto_v03_ifp/v03_ifp/scripts/submit_hit_recovery.sh
    
  12. Get Poses After Docking

    Make a list of test.mol2.gz.0 files from docking:

    find /round_*_docking/bundle_paths -maxdepth 2 -name "test.mol2.gz.0" > docked_poses.txt
    

    Then extract top poses:

    python /wynton/group/bks/work/bwhall61/for_beau/top_poses.py \
        -t <pProp_threshold> \
        -s <num_poses_per_file> \
        -dock_results_path docked_poses.txt