Parabricks Quickstart: Earlier Versions

Note

This page contains a quick start guide for earlier version(s) of Parabricks that are still available but no longer directly supported. Please refer to the latest version for direct support.

storageN

  • The use of storageN within these documents indicates that any storage platform can be used.

  • Current available storage platforms:
    • storage1

    • storage2

v3.6

Getting Started

  • Connect to compute client.

ssh wustlkey@compute1-client-1.ris.wustl.edu
  • Prepare the computing environment before submitting a job.

# Use scratch file system for temp space
export SCRATCH1=/scratch1/fs1/${COMPUTE_ALLOCATION}

# Use Active storage for input and output data
export STORAGEN=/storageN/fs1/${STORAGE_ALLOCATION}/Active

# Mapping for the Parabricks license(s) is required
export LSF_DOCKER_VOLUMES="/scratch1/fs1/ris/application/parabricks-license:/opt/parabricks $SCRATCH1:$SCRATCH1 $STORAGEN:$STORAGEN $HOME:$HOME"
export PATH="/opt/miniconda/bin:$PATH"

# Use host level communications for the GPUs
export LSF_DOCKER_NETWORK=host

# Use debug flag when trying to figure out why your job failed to launch on the cluster
#export LSF_DOCKER_RUN_LOGLEVEL=DEBUG

# Use entry point because the parabricks container has other entrypoints but our cluster, by default, requires /bin/sh
export LSF_DOCKER_ENTRYPOINT=/bin/sh

# Create tmp dir
export TMP_DIR=${STORAGEN}"/parabricks-tmp"
[ ! -d $TMP_DIR ] && mkdir $TMP_DIR
  • Submit job. Basic commands for use:

    • V100 Hardware

    bsub -n 16 -M 64GB -R 'gpuhost rusage[mem=64GB] span[hosts=1]' -q general -gpu "num=1:gmodel=TeslaV100_SXM2_32GB:j_exclusive=yes" -a 'docker(gcr.io/ris-registry-shared/parabricks)' pbrun command options
    
    • A100 Hardware

    bsub -n 16 -M 64GB -R 'gpuhost rusage[mem=64GB] span[hosts=1]' -q general -gpu "num=1:gmodel=NVIDIAA100_SXM4_40GB:j_exclusive=yes" -a 'docker(gcr.io/ris-registry-shared/parabricks_ampere)' pbrun command options
    

Compute Group

  • If you are a member of more than one compute group, you will be prompted to specify an LSF User Group with -G group_name or by setting the LSB_SUB_USER_GROUP variable.

Known Issues

  • VQSR does not support gzipped files.

  • CNVKit --count-reads does not work as expected. A separate CNVKit Docker image can be used an an alternative to this option.

Additional Information

  • You may need to adjust your cores (-n) and memory (-M and mem) depending on your data set.
    • 1 GPU server should have 64GB CPU RAM, at least 16 CPU threads.

    • 2 GPU server should have 100GB CPU RAM, at least 24 CPU threads.

    • 4 GPU server should have 196GB CPU RAM, at least 32 CPU threads.

  • You can run this interactive (-Is) or in batch mode in the general or general-interactive queues.

  • You will probably want to keep the GPUs at 4 and RAM at 196GB unless your data set is smaller than the 5GB test data set.

  • There is diminishing returns on using more GPUs on small data sets.

  • Replace command with any of the pbrun commands such as fq2bam, bqsr, applybqsr, or haplotypecaller.

  • Please refer to official Parabricks documentation for additional direction.