AlphaFold Quickstart: Earlier Versions¶
Note
This page contains a quick start guide for earlier version(s) of AlphaFold that are still available but no longer directly supported. Please refer to the latest version for direct support.
storageN
The use of
storageN
within these documents indicates that any storage platform can be used.- Current available storage platforms:
storage1
storage2
AlphaFold 2.0.0¶
Software Included¶
AlphaFold (https://github.com/deepmind/alphafold)
Getting Started¶
Connect to compute client.
ssh wustlkey@compute1-client-1.ris.wustl.edu
Prepare the computing environment before submitting an AlphaFold job.
# Set the AlphaFold base directory
export ALPHAFOLD_BASE_DIR=/app/alphafold
# Use the scratch file system for temp space
export SCRATCH1=/scratch1/fs1/${COMPUTE_ALLOCATION}
# Use your Active storage for input and output data
export STORAGEN=/storageN/fs1/${STORAGE_ALLOCATION}/Active
# Mount scratch, Active storage, AlphaFold reference databases and the etc folder
export LSF_DOCKER_VOLUMES="/scratch1/fs1/ris/references/AlphaFold:/scratch1/fs1/ris/references/AlphaFold $SCRATCH1:$SCRATCH1 $STORAGEN:$STORAGEN $HOME:$HOME"
# Update $PATH with folders containing AlphaFold, CUDA, and conda executables
export PATH="/usr/local/cuda/bin/:/opt/conda/bin:/app/alphafold:$PATH"
# Use the debug flag when trying to figure out why your job failed to launch on the cluster
#export LSF_DOCKER_RUN_LOGLEVEL=DEBUG
Submit an AlphaFold job that requests a node with 8 vCPUs, 8 GB of memory, and one GPU.
These are the minimum system requirements suggested for running AlphaFold with the
reduced_dbs
setting.
bsub -q general -n 8 -M 8GB -R "gpuhost rusage[mem=8GB] span[hosts=1]" -gpu 'num=1' -a "docker(gcr.io/ris-registry-shared/alphafold:2.0.0)" run_alphafold.sh -o /path/to/output/folder -m model_1,model_2,model_3,model_4,model_5,model_2_ptm -f /path/to/input/protein_sequence.fa -t 2021-08-18 -n 8 -p reduced_dbs
AlphaFold can run on both the V100 and A100 GPU architectures. If you would like to specify the GPU architecture, please modify the
-gpu
argument in the job submission command.
-gpu 'num=1:gmodel=<gpu_model>'
A list of GPU models can be found here.
Jobs can be managed using job groups. Job groups are a way to submit a large number of jobs at once.
Jobs can be submitted to a condo, if available, by specifying the correct condo queue. Information on this can be found here.
Setting Different Model Presets
Different AlphaFold models have different preset configurations. A
description of the different presets can be found below. To change
the preset used, please modify the -p
option in the job submission command.
For example, to use the full_dbs
preset, your job submission would
include -p full_dbs
.
Settings¶
Please see below for a description of the different settings for AlphaFold.
Warning
Settings with a * are required to be set.
-o <output_dir>
Path to a directory that will store the results. *-m <model_names>
Names of models to use (a comma separated list). *-f <fasta_path>
Path to a FASTA file containing one sequence. *-t <max_template_date>
Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets. *-b <benchmark>
Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default:'False'
).-g <use_gpu>
Enable NVIDIA runtime to run with GPUs (default:True
).-p <preset>
Choose preset model configuration - no ensembling and smaller genetic database config (reduced_dbs), no ensembling and full genetic database config (full_dbs) or full genetic database config and 8 model ensemblings (casp14). (default:full_dbs
).-d <data_dir>
Path to a directory containing the reference databases. Use this option if you want to use your own reference databases.
Preset Models¶
Please see below for a description of the different preset model configurations available. These presets control the speed and quality of AlphaFold.
reduced_dbs
: This preset is optimized for speed and lower hardware requirements.full_dbs
: This preset runs with all genetic databases and with no ensembling.casp14
: This preset uses the same settings as were used in CASP14. It runs with all genetic databases and with ensemblings.
Output¶
The AlphaFold output will be in a subfolder of
output_dir
set with the-o
option.Output includes:
Computed MSAs
Unrelaxed structures
Relaxed structures
Ranked structures
Raw model outputs
Prediction metadata
Section timings
The
output_dir
directory will have the following structure:
<target_name>/
features.pkl
ranked_{0,1,2,3,4}.pdb
ranking_debug.json
relaxed_model_{1,2,3,4,5}.pdb
result_model_{1,2,3,4,5}.pkl
timings.json
unrelaxed_model_{1,2,3,4,5}.pdb
msas/
bfd_uniclust_hits.a3m
mgnify_hits.sto
uniref90_hits.sto
Please see AlphaFold output documentation for more information on AlphaFold output.