AlphaFold Quickstart: Earlier Versions

Note

This page contains a quick start guide for earlier version(s) of AlphaFold that are still available but no longer directly supported. Please refer to the latest version for direct support.

AlphaFold 2.0.0

Software Included

Getting Started

  • Connect to compute client.

ssh wustlkey@compute1-client-1.ris.wustl.edu
  • Prepare the computing environment before submitting an AlphaFold job.

# Set the AlphaFold base directory
export ALPHAFOLD_BASE_DIR=/app/alphafold

# Use the scratch file system for temp space
export SCRATCH1=/scratch1/fs1/${COMPUTE_ALLOCATION}

# Use your Active storage for input and output data
export STORAGE1=/storage1/fs1/${STORAGE_ALLOCATION}/Active

# Mount scratch, Active storage, AlphaFold reference databases and the etc folder
export LSF_DOCKER_VOLUMES="/scratch1/fs1/ris/references/AlphaFold:/scratch1/fs1/ris/references/AlphaFold $SCRATCH1:$SCRATCH1 $STORAGE1:$STORAGE1 $HOME:$HOME"

# Update $PATH with folders containing AlphaFold, CUDA, and conda executables
export PATH="/usr/local/cuda/bin/:/opt/conda/bin:/app/alphafold:$PATH"

# Use the debug flag when trying to figure out why your job failed to launch on the cluster
#export LSF_DOCKER_RUN_LOGLEVEL=DEBUG
  • Submit an AlphaFold job that requests a node with 8 vCPUs, 8 GB of memory, and one GPU.

    • These are the minimum system requirements suggested for running AlphaFold with the reduced_dbs setting.

bsub -q general -n 8 -M 8GB -R "gpuhost rusage[mem=8GB] span[hosts=1]" -gpu 'num=1' -a "docker(gcr.io/ris-registry-shared/alphafold:2.0.0)" run_alphafold.sh -o /path/to/output/folder -m model_1,model_2,model_3,model_4,model_5,model_2_ptm -f /path/to/input/protein_sequence.fa -t 2021-08-18 -n 8 -p reduced_dbs
  • AlphaFold can run on both the V100 and A100 GPU architectures. If you would like to specify the GPU architecture, please modify the -gpu argument in the job submission command.

-gpu 'num=1:gmodel=<gpu_model>'
  • A list of GPU models can be found here.

  • Jobs can be managed using job groups. Job groups are a way to submit a large number of jobs at once.

  • Jobs can be submitted to a condo, if available, by specifying the correct condo queue. Information on this can be found here.

Setting Different Model Presets

Different AlphaFold models have different preset configurations. A description of the different presets can be found below. To change the preset used, please modify the -p option in the job submission command.

For example, to use the full_dbs preset, your job submission would include -p full_dbs.

Settings

Please see below for a description of the different settings for AlphaFold.

Warning

Settings with a * are required to be set.

  • -o <output_dir> Path to a directory that will store the results. *

  • -m <model_names> Names of models to use (a comma separated list). *

  • -f <fasta_path> Path to a FASTA file containing one sequence. *

  • -t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets. *

  • -b <benchmark> Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'False').

  • -g <use_gpu> Enable NVIDIA runtime to run with GPUs (default: True).

  • -p <preset> Choose preset model configuration - no ensembling and smaller genetic database config (reduced_dbs), no ensembling and full genetic database config (full_dbs) or full genetic database config and 8 model ensemblings (casp14). (default: full_dbs).

  • -d <data_dir> Path to a directory containing the reference databases. Use this option if you want to use your own reference databases.

Preset Models

Please see below for a description of the different preset model configurations available. These presets control the speed and quality of AlphaFold.

  • reduced_dbs: This preset is optimized for speed and lower hardware requirements.

  • full_dbs: This preset runs with all genetic databases and with no ensembling.

  • casp14: This preset uses the same settings as were used in CASP14. It runs with all genetic databases and with ensemblings.

Output

  • The AlphaFold output will be in a subfolder of output_dir set with the -o option.

  • Output includes:

    • Computed MSAs

    • Unrelaxed structures

    • Relaxed structures

    • Ranked structures

    • Raw model outputs

    • Prediction metadata

    • Section timings

  • The output_dir directory will have the following structure:

<target_name>/
features.pkl
ranked_{0,1,2,3,4}.pdb
ranking_debug.json
relaxed_model_{1,2,3,4,5}.pdb
result_model_{1,2,3,4,5}.pkl
timings.json
unrelaxed_model_{1,2,3,4,5}.pdb
msas/
    bfd_uniclust_hits.a3m
    mgnify_hits.sto
    uniref90_hits.sto