AlphaFold Quickstart: Earlier Versions

Note

This page contains a quick start guide for earlier version(s) of AlphaFold that are still available but no longer directly supported. Please refer to the latest version for direct support.

storageN

  • The use of storageN within these documents indicates that any storage platform can be used.

  • Current available storage platforms:
    • storage1

    • storage2

AlphaFold 2.0.0

Software Included

Getting Started

  • Connect to compute client.

ssh wustlkey@compute1-client-1.ris.wustl.edu
  • Prepare the computing environment before submitting an AlphaFold job.

# Set the AlphaFold base directory
export ALPHAFOLD_BASE_DIR=/app/alphafold

# Use the scratch file system for temp space
export SCRATCH1=/scratch1/fs1/${COMPUTE_ALLOCATION}

# Use your Active storage for input and output data
export STORAGEN=/storageN/fs1/${STORAGE_ALLOCATION}/Active

# Mount scratch, Active storage, AlphaFold reference databases and the etc folder
export LSF_DOCKER_VOLUMES="/scratch1/fs1/ris/references/AlphaFold:/scratch1/fs1/ris/references/AlphaFold $SCRATCH1:$SCRATCH1 $STORAGEN:$STORAGEN $HOME:$HOME"

# Update $PATH with folders containing AlphaFold, CUDA, and conda executables
export PATH="/usr/local/cuda/bin/:/opt/conda/bin:/app/alphafold:$PATH"

# Use the debug flag when trying to figure out why your job failed to launch on the cluster
#export LSF_DOCKER_RUN_LOGLEVEL=DEBUG
  • Submit an AlphaFold job that requests a node with 8 vCPUs, 8 GB of memory, and one GPU.

    • These are the minimum system requirements suggested for running AlphaFold with the reduced_dbs setting.

bsub -q general -n 8 -M 8GB -R "gpuhost rusage[mem=8GB] span[hosts=1]" -gpu 'num=1' -a "docker(gcr.io/ris-registry-shared/alphafold:2.0.0)" run_alphafold.sh -o /path/to/output/folder -m model_1,model_2,model_3,model_4,model_5,model_2_ptm -f /path/to/input/protein_sequence.fa -t 2021-08-18 -n 8 -p reduced_dbs
  • AlphaFold can run on both the V100 and A100 GPU architectures. If you would like to specify the GPU architecture, please modify the -gpu argument in the job submission command.

-gpu 'num=1:gmodel=<gpu_model>'
  • A list of GPU models can be found here.

  • Jobs can be managed using job groups. Job groups are a way to submit a large number of jobs at once.

  • Jobs can be submitted to a condo, if available, by specifying the correct condo queue. Information on this can be found here.

Setting Different Model Presets

Different AlphaFold models have different preset configurations. A description of the different presets can be found below. To change the preset used, please modify the -p option in the job submission command.

For example, to use the full_dbs preset, your job submission would include -p full_dbs.

Settings

Please see below for a description of the different settings for AlphaFold.

Warning

Settings with a * are required to be set.

  • -o <output_dir> Path to a directory that will store the results. *

  • -m <model_names> Names of models to use (a comma separated list). *

  • -f <fasta_path> Path to a FASTA file containing one sequence. *

  • -t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets. *

  • -b <benchmark> Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'False').

  • -g <use_gpu> Enable NVIDIA runtime to run with GPUs (default: True).

  • -p <preset> Choose preset model configuration - no ensembling and smaller genetic database config (reduced_dbs), no ensembling and full genetic database config (full_dbs) or full genetic database config and 8 model ensemblings (casp14). (default: full_dbs).

  • -d <data_dir> Path to a directory containing the reference databases. Use this option if you want to use your own reference databases.

Preset Models

Please see below for a description of the different preset model configurations available. These presets control the speed and quality of AlphaFold.

  • reduced_dbs: This preset is optimized for speed and lower hardware requirements.

  • full_dbs: This preset runs with all genetic databases and with no ensembling.

  • casp14: This preset uses the same settings as were used in CASP14. It runs with all genetic databases and with ensemblings.

Output

  • The AlphaFold output will be in a subfolder of output_dir set with the -o option.

  • Output includes:

    • Computed MSAs

    • Unrelaxed structures

    • Relaxed structures

    • Ranked structures

    • Raw model outputs

    • Prediction metadata

    • Section timings

  • The output_dir directory will have the following structure:

<target_name>/
features.pkl
ranked_{0,1,2,3,4}.pdb
ranking_debug.json
relaxed_model_{1,2,3,4,5}.pdb
result_model_{1,2,3,4,5}.pkl
timings.json
unrelaxed_model_{1,2,3,4,5}.pdb
msas/
    bfd_uniclust_hits.a3m
    mgnify_hits.sto
    uniref90_hits.sto