.. _`alphafold-deprecated-quickstart`:

======================================
AlphaFold Quickstart: Earlier Versions
======================================

.. contents::
   :depth: 2
   :local:

.. note::
    This page contains a quick start guide for earlier version(s) of AlphaFold that are still available but no longer
    directly supported. Please refer to the :ref:`latest version <alphafold-quickstart>` for direct support.


AlphaFold 2.0.0
===============

Software Included
-----------------

- AlphaFold (https://github.com/deepmind/alphafold)


Getting Started
---------------

- Connect to compute client.

.. code::

    ssh wustlkey@compute1-client-1.ris.wustl.edu

- Prepare the computing environment before submitting an AlphaFold job.

.. code::

    # Set the AlphaFold base directory
    export ALPHAFOLD_BASE_DIR=/app/alphafold

    # Use the scratch file system for temp space
    export SCRATCH1=/scratch1/fs1/${COMPUTE_ALLOCATION}

    # Use your Active storage for input and output data
    export STORAGE1=/storage1/fs1/${STORAGE_ALLOCATION}/Active

    # Mount scratch, Active storage, AlphaFold reference databases and the etc folder
    export LSF_DOCKER_VOLUMES="/scratch1/fs1/ris/references/AlphaFold:/scratch1/fs1/ris/references/AlphaFold $SCRATCH1:$SCRATCH1 $STORAGE1:$STORAGE1 $HOME:$HOME"

    # Update $PATH with folders containing AlphaFold, CUDA, and conda executables
    export PATH="/usr/local/cuda/bin/:/opt/conda/bin:/app/alphafold:$PATH"

    # Use the debug flag when trying to figure out why your job failed to launch on the cluster
    #export LSF_DOCKER_RUN_LOGLEVEL=DEBUG


- Submit an AlphaFold job that requests a node with 8 vCPUs, 8 GB of memory, and
  one GPU.

  - These are the minimum system requirements suggested for running AlphaFold with the
    ``reduced_dbs`` setting.

.. code::

    bsub -q general -n 8 -M 8GB -R "gpuhost rusage[mem=8GB] span[hosts=1]" -gpu 'num=1' -a "docker(gcr.io/ris-registry-shared/alphafold:2.0.0)" run_alphafold.sh -o /path/to/output/folder -m model_1,model_2,model_3,model_4,model_5,model_2_ptm -f /path/to/input/protein_sequence.fa -t 2021-08-18 -n 8 -p reduced_dbs

- AlphaFold can run on both the V100 and A100 GPU architectures. If you would
  like to specify the GPU architecture, please modify the ``-gpu`` argument in the
  job submission command.

.. code::

  -gpu 'num=1:gmodel=<gpu_model>'

- A list of GPU models can be found :ref:`here <gpu-resources>`.

- Jobs can be managed using :ref:`job groups <job-groups>`. Job groups are a way
  to submit a large number of jobs at once.

- Jobs can be submitted to a condo, if available, by specifying the correct
  condo queue. Information on this can be found :ref:`here
  <condo-host-job-example>`.


.. admonition:: Setting Different Model Presets

    Different AlphaFold models have different preset configurations. A
    description of the different presets can be found below. To change
    the preset used, please modify the ``-p`` option in the job submission command.

    For example, to use the ``full_dbs`` preset, your job submission would
    include ``-p full_dbs``.


Settings
--------

Please see below for a description of the different settings for AlphaFold.

.. warning::

    Settings with a **\*** are required to be set.

- ``-o <output_dir>``   Path to a directory that will store the results. **\***
- ``-m <model_names>``  Names of models to use (a comma separated list). **\***
- ``-f <fasta_path>``   Path to a FASTA file containing one sequence. **\***
- ``-t <max_template_date>`` Maximum template release date to consider (ISO-8601
  format - i.e. YYYY-MM-DD). Important if folding historical test sets. **\***
- ``-b <benchmark>``    Run multiple JAX model evaluations to obtain a timing
  that excludes the compilation time, which should be more indicative of the time
  required for inferencing many proteins (default: ``'False'``).
- ``-g <use_gpu>``      Enable NVIDIA runtime to run with GPUs (default: ``True``).
- ``-p <preset>``       Choose preset model configuration - no ensembling and
  smaller genetic database config (reduced_dbs), no ensembling and full genetic
  database config  (full_dbs) or full genetic database config and 8 model
  ensemblings (casp14). (default: ``full_dbs``).
- ``-d <data_dir>``     Path to a directory containing the reference databases.
  Use this option if you want to use your own reference databases.


Preset Models
-------------

Please see below for a description of the different preset model configurations
available. These presets control the speed and quality of AlphaFold.

- ``reduced_dbs``: This preset is optimized for speed and lower hardware requirements.
- ``full_dbs``: This preset runs with all genetic databases and with no ensembling.
- ``casp14``: This preset uses the same settings as were used in CASP14. It runs
  with all genetic databases and with ensemblings.


Output
------

- The AlphaFold output will be in a subfolder of ``output_dir`` set with the
  ``-o`` option.

- Output includes:

  - Computed MSAs
  - Unrelaxed structures
  - Relaxed structures
  - Ranked structures
  - Raw model outputs
  - Prediction metadata
  - Section timings

- The ``output_dir`` directory will have the following structure:

.. code::

    <target_name>/
    features.pkl
    ranked_{0,1,2,3,4}.pdb
    ranking_debug.json
    relaxed_model_{1,2,3,4,5}.pdb
    result_model_{1,2,3,4,5}.pkl
    timings.json
    unrelaxed_model_{1,2,3,4,5}.pdb
    msas/
        bfd_uniclust_hits.a3m
        mgnify_hits.sto
        uniref90_hits.sto

- Please see `AlphaFold output documentation
  <https://github.com/deepmind/alphafold#alphafold-output>`__ for more information
  on AlphaFold output.