RIS Compute 101

Note

Connecting to get command line access: ssh wustlkey@compute1-client-1.ris.wustl.edu

Queue to use: workshop, workshop-interactive

Group to use: compute-workshop (if part of multiple groups)

Compute 101 Video

What will this documentation provide?

  • An introduction for users to the basic HPC environment and what’s required to work within an HPC environment

  • Explain the differences between traditional HPC environments and what is offered by RIS

What is High Performance Computing?

  • It’s NOT Big Data, though it is used to analyze Big Data.

  • It’s NOT the Cloud, though it can be utilized through cloud systems.

  • It IS leveraging computing power beyond every day capacity to do science and engineering.

How HPC Impacts Us

INDUSTRY

  • Engineering

  • Oil and Gas

  • Diagnostics

NATIONAL DEFENSE
  • Modeling of Nuclear Weapons

  • Cryptography/Cryptanalysis

SCIENCE

  • Physics: Particle Physics, LIGO

  • Chemistry/Biology: Molecular Modeling, Drug Discovery

  • Environmental Science: Weather Forecasting, Climate Modeling

  • Genomics: Next Generation Sequencing, Variant Analysis

  • Others?: Big Data Analysis, Qualitative Data able to be Analyzed Quantitatively

RIS HPC Specs

  • Base system
    • 5,000 Intel Cascade Lake Cores

    • 120 Nvidia Telsa V100 GPUs

    • 300TB DDN high-performance scratch space

    • 100Gbit Mellanox HDR Network

  • Batch computing across thousands of CPU and hundreds of GPUs

  • Integration with WashU Research Network (WURN) (40Gbit)

  • Integration with Data Transfer

  • Integration with Research Storage

  • Independent software run times with Docker

  • Integration with WUSTL Key Identity

  • Dedicated 10Gbit connection to Google Cloud

  • Free training seminars and webinars

When to Use HPC

  • Analyzing data takes a long time on your personal or lab computer

  • Files or datasets are really large

  • You need to analyze large sample sizes or datasets with large numbers of files

Typical Workflow on an HPC System

  • Transfer of data from a local computer onto the HPC system

  • Generation of a script to tell the system how to analyze the data

  • Transfer of the results off the HPC system onto a local computer

Tools to Connect and Move Data

What is Shell?

  • A shell is a user interface for interacting with the Operating System (OS) - Either the OS on your personal computer or on the HPC system

  • OS Examples: Microsoft Windows, Mac OSX, Linux, Android (Mobile)

Two Types of User Interface

  • GUI - Graphical User Interface (Point and Click)

  • CLI - Command Line Interface (Text Input and Output)

GUI - Generally Intuitive (phone, laptop, tablet)
  • Point and click for dealing with files

../../../_images/GUI-example.png
CLI - User when connecting to an HPC system
  • Simple commands with powerful options

  • Commands combined to automate tasks

  • Steep learning curve due to the nature of needing to know the commands

../../../_images/CLI-example.png

(This is actually a CLI within a GUI)

Command Line Commands in Linux and Unix

  • Mac OSX and Linux both have terminals built within their OS and GUI within which you can do CLI

  • Windows requires software to be installed (recommended software listed below)
  • Secure Shell (SSH) Protocol
    • It was designed as a secure replacement for telnet

    • Different software have different implementations of the SSH protocol

    • Requires either a username/password combination to authenticate or an SSH key pair

  • Basic SSH Command (replacing wustlkey with your own WUSTL Key)

    ssh username@machine-id
    ssh wustlkey@compute1-client-1.ris.wustl.edu
    
  • Linux File Tree / Directory Structure
    ../../../_images/directory-structure.png
    • In the image, you can see the directory structure as it looks in a GUI and as it looks in a CLI.

    ../../../_images/linux-directory-structure.png
    • There are home and scratch directories for each user.

    • You can’t double click to move around inside directories and open files in the CLI as you would a GUI like Mac or Windows

    • The directory locations and files need to be expressed as paths

    • Imagine we have a file name my-file.txt in the home directory of the above structure, the full path would be:

    /home/username/my-file.txt
    
    • ~ is a shortcut that represents the home directory, so the above example can also be expressed as:

    ~/my-file.txt
    
  • Before we get into CLI commands, there is a philosophy around which these commands are based
    • Write commands that do one thing and do it well

    • Write commands that work together

    • Write commands to handle text streams, because that is a universal interface

  • Commands Within CLI for Dealing with Files and Directories (Folders) (https://ubuntu.com/tutorials/command-line-for-beginners#1-overview)
    • These are usually short, in fact many basic commands are only two characters long

    • Commands often require arguments and have options

    command -o --option argument1 argument2
    
    • Commands usually provide documentation

    command -h --help
    
    ../../../_images/help.example.gif
    • Commands also have manual pages that give the most accurate description of a command and it’s options.

    • The spacebar is used to page through the manual and ‘q’ is used to quit out of the manual

    • They are not easy to read but provides the most complete and accurate information about a command

    man command
    
    ../../../_images/man.example.gif
    • pwd - Print Working Directory - this returns the current location in the directory structure that you are in (this is also known as the working directory)

    ../../../_images/pwd.example.gif
    • ls - this command simply lists all the files and directories contained within the directory you are currently in

    ../../../_images/ls.example.gif
    • ls -lh - this command lists all the files and directories contained within the directory you are currently in with added detail, like owner, permissions, date of latest modification, file size

    ../../../_images/lslh.example.gif
    • mkdir - this command is ‘make directory’ and it makes a directory with the provided name

    ../../../_images/mkdir.example.gif
    • cd - this command is ‘change directory’ and it allows you to move between directories

    ../../../_images/cd.example.gif
    • rm - this command removes a file or files.

    Warning

    This is not like sending items to the trash. This is permanent removal and should be used with care.

    ../../../_images/rm.example.gif
    • rmdir - this removes a directory, the directory must be empty.

    Warning

    This is permanent removal and should be used with care.

    ../../../_images/rmdir.example.gif
    • rm -rf - this is the most dangerous command to use. Do not use this command unless you are experienced and know what you’re doing.
      • the option -r makes the remove recursive meaning it will go into nested directories and remove all files within as well as the directories

      • the option -f forces the removal of files and directories with a prompt, meaning you won’t get asked if you want to delete files

    • Running Multiple Commands Simultaneously
      • When a command is executed, the shell doesn’t return to the prompt (user input option) until the command has finished.

      ../../../_images/sleep.example.gif
      • But we often want to run other commands before our first command has finished

      • You can stop a command using ctrl-Z and then have it run in the background with ‘bg’

      ../../../_images/ctrlz.example.gif
      • Or you can simply start a job in the background by using an ampersand (&) at the end of the command

      ../../../_images/amp.example.gif
Transferring Data to the HPC System
  • We do NOT use SCP or SFTP protocols to transfer data to the storage associated with the HPC system.

  • We use SMB (Server Message Block) - this is just another form of network protocol for dealing with files

  • Information on how to transfer data to and from Storage can be found here.

Naming Files and Using Spaces in File Names
  • In the Windows and Mac environments, it is common usage to have spaces within the file names

my file name.txt
  • In the Unix and Linux environments, spaces are used to separate arguments within commands so it is best to avoid spaces.

  • You can deal with spaces within file names if you need to. You just need to use slashes so that the environment can recognize the spaces as contained within the file name.

my\ file\ name.txt
  • If you try to use a command on a file with spaces without utilizes the slashes, it will not behave like you expect.

  • An example is the mv command. If you try the following it will try to move the files my, file, and name.txt

mv my file name.txt
  • However, if you include the slashes, it will operate as expected.

mv my\ file\ name.txt
../../../_images/file-name.example.gif