.. _`alphafold-deprecated-quickstart`: ====================================== AlphaFold Quickstart: Earlier Versions ====================================== .. contents:: :depth: 2 :local: .. note:: This page contains a quick start guide for earlier version(s) of AlphaFold that are still available but no longer directly supported. Please refer to the :ref:`latest version ` for direct support. AlphaFold 2.0.0 =============== Software Included ----------------- - AlphaFold (https://github.com/deepmind/alphafold) Getting Started --------------- - Connect to compute client. .. code:: ssh wustlkey@compute1-client-1.ris.wustl.edu - Prepare the computing environment before submitting an AlphaFold job. .. code:: # Set the AlphaFold base directory export ALPHAFOLD_BASE_DIR=/app/alphafold # Use the scratch file system for temp space export SCRATCH1=/scratch1/fs1/${COMPUTE_ALLOCATION} # Use your Active storage for input and output data export STORAGE1=/storage1/fs1/${STORAGE_ALLOCATION}/Active # Mount scratch, Active storage, AlphaFold reference databases and the etc folder export LSF_DOCKER_VOLUMES="/scratch1/fs1/ris/references/AlphaFold:/scratch1/fs1/ris/references/AlphaFold $SCRATCH1:$SCRATCH1 $STORAGE1:$STORAGE1 $HOME:$HOME" # Update $PATH with folders containing AlphaFold, CUDA, and conda executables export PATH="/usr/local/cuda/bin/:/opt/conda/bin:/app/alphafold:$PATH" # Use the debug flag when trying to figure out why your job failed to launch on the cluster #export LSF_DOCKER_RUN_LOGLEVEL=DEBUG - Submit an AlphaFold job that requests a node with 8 vCPUs, 8 GB of memory, and one GPU. - These are the minimum system requirements suggested for running AlphaFold with the ``reduced_dbs`` setting. .. code:: bsub -q general -n 8 -M 8GB -R "gpuhost rusage[mem=8GB] span[hosts=1]" -gpu 'num=1' -a "docker(gcr.io/ris-registry-shared/alphafold:2.0.0)" run_alphafold.sh -o /path/to/output/folder -m model_1,model_2,model_3,model_4,model_5,model_2_ptm -f /path/to/input/protein_sequence.fa -t 2021-08-18 -n 8 -p reduced_dbs - AlphaFold can run on both the V100 and A100 GPU architectures. If you would like to specify the GPU architecture, please modify the ``-gpu`` argument in the job submission command. .. code:: -gpu 'num=1:gmodel=' - A list of GPU models can be found :ref:`here `. - Jobs can be managed using :ref:`job groups `. Job groups are a way to submit a large number of jobs at once. - Jobs can be submitted to a condo, if available, by specifying the correct condo queue. Information on this can be found :ref:`here `. .. admonition:: Setting Different Model Presets Different AlphaFold models have different preset configurations. A description of the different presets can be found below. To change the preset used, please modify the ``-p`` option in the job submission command. For example, to use the ``full_dbs`` preset, your job submission would include ``-p full_dbs``. Settings -------- Please see below for a description of the different settings for AlphaFold. .. warning:: Settings with a **\*** are required to be set. - ``-o `` Path to a directory that will store the results. **\*** - ``-m `` Names of models to use (a comma separated list). **\*** - ``-f `` Path to a FASTA file containing one sequence. **\*** - ``-t `` Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets. **\*** - ``-b `` Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: ``'False'``). - ``-g `` Enable NVIDIA runtime to run with GPUs (default: ``True``). - ``-p `` Choose preset model configuration - no ensembling and smaller genetic database config (reduced_dbs), no ensembling and full genetic database config (full_dbs) or full genetic database config and 8 model ensemblings (casp14). (default: ``full_dbs``). - ``-d `` Path to a directory containing the reference databases. Use this option if you want to use your own reference databases. Preset Models ------------- Please see below for a description of the different preset model configurations available. These presets control the speed and quality of AlphaFold. - ``reduced_dbs``: This preset is optimized for speed and lower hardware requirements. - ``full_dbs``: This preset runs with all genetic databases and with no ensembling. - ``casp14``: This preset uses the same settings as were used in CASP14. It runs with all genetic databases and with ensemblings. Output ------ - The AlphaFold output will be in a subfolder of ``output_dir`` set with the ``-o`` option. - Output includes: - Computed MSAs - Unrelaxed structures - Relaxed structures - Ranked structures - Raw model outputs - Prediction metadata - Section timings - The ``output_dir`` directory will have the following structure: .. code:: / features.pkl ranked_{0,1,2,3,4}.pdb ranking_debug.json relaxed_model_{1,2,3,4,5}.pdb result_model_{1,2,3,4,5}.pkl timings.json unrelaxed_model_{1,2,3,4,5}.pdb msas/ bfd_uniclust_hits.a3m mgnify_hits.sto uniref90_hits.sto - Please see `AlphaFold output documentation `__ for more information on AlphaFold output.