.. _`parabricks-quickstart`:

=====================
Parabricks Quickstart
=====================

.. contents::
   :depth: 1
   :local:

.. admonition:: Compute Resources

    - Have questions or need help with compute, including activation or issues? Follow `this link. <https://washu.atlassian.net/servicedesk/customer/portal/2/group/6/create/43>`__
    - :ref:`User Agreement <compute-user-agreement>`

.. admonition:: Docker Usage

    - The information contained on this page assumes that you have a knowledge base of using Docker to create images and push them to a repository for use. If you need to review that information, please see the links below.
    - :ref:`Docker and the RIS Compute Service <docker-on-compute>`
    - :ref:`Docker Basics: Building, Tagging, & Pushing A Custom Docker Image <ris-docker-basics>`


Image Details
-------------

- Docker image hosted at nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1
- Official documentation for `Parabricks version 4.0.0. <https://docs.nvidia.com/clara/parabricks/4.0.0/index.html>`__


Getting Started
---------------

- Connect to compute client.

.. code::

    ssh wustlkey@compute1-client-1.ris.wustl.edu

- Prepare the computing environment before submitting a job.

.. code::

    # Use scratch file system for temp space
    export SCRATCH1=/scratch1/fs1/${COMPUTE_ALLOCATION}

    # Use Active storage for input and output data
    export STORAGE1=/storage1/fs1/${STORAGE_ALLOCATION}/Active

    # Use host level communications for the GPUs
    export LSF_DOCKER_NETWORK=host

    # Use debug flag when trying to figure out why your job failed to launch on the cluster
    #export LSF_DOCKER_RUN_LOGLEVEL=DEBUG

    # Use entry point since the parabricks container has other entrypoints but our cluster, by default, requires /bin/sh
    export LSF_DOCKER_ENTRYPOINT=/bin/sh

    # Create tmp dir
    export TMP_DIR=${SCRATCH1}"/parabricks-tmp"
    [ ! -d $TMP_DIR ] && mkdir $TMP_DIR

- Submit job. Basic commands for use:

.. code::

    bsub -n 16 -M 64GB -R 'gpuhost rusage[mem=64GB] span[hosts=1]' -q general -gpu "num=1:j_exclusive=yes" -a 'docker(nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1)' pbrun command options


Known Issues
------------

- Parabricks relies on available GPU(s) noted with ``NVIDIA_VISIBLE_DEVICES`` which defaults to 'all' regardless the quantity and
    device number of GPU(s) reserved at runtime. As such, there is a possibility the software will attempt to run on GPU(s) the
    job does not have access to. At this time it is advised to prepend ``pbrun`` with the following.

    .. code::

        for VAR in $(printenv | grep CUDA_VISIBLE_DEVICES); do export ${VAR/CUDA/NVIDIA}; done


Additional Information
----------------------

- Cores (``-n``) and memory (``-M`` and ``mem``) may need to be adjusted depending on the data set used.
    - 1 GPU server should have 64GB CPU RAM, at least 16 CPU threads.
    - 2 GPU server should have 100GB CPU RAM, at least 24 CPU threads.
    - 4 GPU server should have 196GB CPU RAM, at least 32 CPU threads.
- It is suggested to keep the GPUs at 4 and RAM at 196GB unless your data set is smaller than the 5GB test data set.
- There is diminishing returns on using more GPUs on small data sets.
- Replace ``command`` with any of the ``pbrun`` commands such as ``fq2bam``, ``bqsr``, ``applybqsr``, or ``haplotypecaller``.
- Please refer to official `Parabricks documentation <https://docs.nvidia.com/clara/parabricks/4.0.0/index.html>`__ for additional direction.


Earlier Versions
----------------

Earlier versions are still available but no longer directly supported by RIS.
Please refer to the latest version for direct support.

.. toctree::
    :maxdepth: 1

    deprecated-tools/parabricks-deprecated-quickstart