RIS Compute 101¶
Note
Connecting to get command line access: ssh wustlkey@compute1-client-1.ris.wustl.edu
Queue to use: workshop, workshop-interactive
Group to use: compute-workshop
(if part of multiple groups)
Compute 101 Video¶
What will this documentation provide?¶
An introduction for users to the basic HPC environment and what’s required to work within an HPC environment
Explain the differences between traditional HPC environments and what is offered by RIS
What is High Performance Computing?¶
It’s NOT Big Data, though it is used to analyze Big Data.
It’s NOT the Cloud, though it can be utilized through cloud systems.
It IS leveraging computing power beyond every day capacity to do science and engineering.
How HPC Impacts Us¶
INDUSTRY
Engineering
Oil and Gas
Diagnostics
- NATIONAL DEFENSE
Modeling of Nuclear Weapons
Cryptography/Cryptanalysis
SCIENCE
Physics: Particle Physics, LIGO
Chemistry/Biology: Molecular Modeling, Drug Discovery
Environmental Science: Weather Forecasting, Climate Modeling
Genomics: Next Generation Sequencing, Variant Analysis
Others?: Big Data Analysis, Qualitative Data able to be Analyzed Quantitatively
RIS HPC Specs¶
- Base system
5,000 Intel Cascade Lake Cores
120 Nvidia Telsa V100 GPUs
300TB DDN high-performance scratch space
100Gbit Mellanox HDR Network
Batch computing across thousands of CPU and hundreds of GPUs
Integration with WashU Research Network (WURN) (40Gbit)
Integration with Data Transfer
Integration with Research Storage
Independent software run times with Docker
Integration with WUSTL Key Identity
Dedicated 10Gbit connection to Google Cloud
Free training seminars and webinars
When to Use HPC¶
Analyzing data takes a long time on your personal or lab computer
Files or datasets are really large
You need to analyze large sample sizes or datasets with large numbers of files
Typical Workflow on an HPC System¶
Transfer of data from a local computer onto the HPC system
Generation of a script to tell the system how to analyze the data
Transfer of the results off the HPC system onto a local computer
Tools to Connect and Move Data¶
What is Shell?
A shell is a user interface for interacting with the Operating System (OS) - Either the OS on your personal computer or on the HPC system
OS Examples: Microsoft Windows, Mac OSX, Linux, Android (Mobile)
Two Types of User Interface
GUI - Graphical User Interface (Point and Click)
CLI - Command Line Interface (Text Input and Output)
- GUI - Generally Intuitive (phone, laptop, tablet)
Point and click for dealing with files
- CLI - User when connecting to an HPC system
Simple commands with powerful options
Commands combined to automate tasks
Steep learning curve due to the nature of needing to know the commands
(This is actually a CLI within a GUI)
Command Line Commands in Linux and Unix
Mac OSX and Linux both have terminals built within their OS and GUI within which you can do CLI
- Windows requires software to be installed (recommended software listed below)
MobaXterm - https://mobaxterm.mobatek.net/
- Secure Shell (SSH) Protocol
It was designed as a secure replacement for telnet
Different software have different implementations of the SSH protocol
Requires either a username/password combination to authenticate or an SSH key pair
Basic SSH Command (replacing
wustlkey
with your own WUSTL Key)ssh username@machine-id ssh wustlkey@compute1-client-1.ris.wustl.edu
- Linux File Tree / Directory Structure
In the image, you can see the directory structure as it looks in a GUI and as it looks in a CLI.
There are home and scratch directories for each user.
You can’t double click to move around inside directories and open files in the CLI as you would a GUI like Mac or Windows
The directory locations and files need to be expressed as paths
Imagine we have a file name my-file.txt in the home directory of the above structure, the full path would be:
/home/username/my-file.txt
~
is a shortcut that represents the home directory, so the above example can also be expressed as:~/my-file.txt
- Before we get into CLI commands, there is a philosophy around which these commands are based
Write commands that do one thing and do it well
Write commands that work together
Write commands to handle text streams, because that is a universal interface
- Commands Within CLI for Dealing with Files and Directories (Folders) (https://ubuntu.com/tutorials/command-line-for-beginners#1-overview)
These are usually short, in fact many basic commands are only two characters long
Commands often require arguments and have options
command -o --option argument1 argument2
Commands usually provide documentation
command -h --help
Commands also have manual pages that give the most accurate description of a command and it’s options.
The spacebar is used to page through the manual and ‘q’ is used to quit out of the manual
They are not easy to read but provides the most complete and accurate information about a command
man command
pwd
- Print Working Directory - this returns the current location in the directory structure that you are in (this is also known as the working directory)
ls
- this command simply lists all the files and directories contained within the directory you are currently in
ls -lh
- this command lists all the files and directories contained within the directory you are currently in with added detail, like owner, permissions, date of latest modification, file size
mkdir
- this command is ‘make directory’ and it makes a directory with the provided name
cd
- this command is ‘change directory’ and it allows you to move between directories
rm
- this command removes a file or files.Warning
This is not like sending items to the trash. This is permanent removal and should be used with care.
rmdir
- this removes a directory, the directory must be empty.Warning
This is permanent removal and should be used with care.
rm -rf
- this is the most dangerous command to use. Do not use this command unless you are experienced and know what you’re doing.
the option -r makes the remove recursive meaning it will go into nested directories and remove all files within as well as the directories
the option -f forces the removal of files and directories with a prompt, meaning you won’t get asked if you want to delete files
- Running Multiple Commands Simultaneously
When a command is executed, the shell doesn’t return to the prompt (user input option) until the command has finished.
But we often want to run other commands before our first command has finished
You can stop a command using ctrl-Z and then have it run in the background with ‘bg’
Or you can simply start a job in the background by using an ampersand (&) at the end of the command
- Transferring Data to the HPC System
We do NOT use SCP or SFTP protocols to transfer data to the storage associated with the HPC system.
We use SMB (Server Message Block) - this is just another form of network protocol for dealing with files
Information on how to transfer data to and from Storage can be found here.
- Naming Files and Using Spaces in File Names
In the Windows and Mac environments, it is common usage to have spaces within the file names
my file name.txt
In the Unix and Linux environments, spaces are used to separate arguments within commands so it is best to avoid spaces.
You can deal with spaces within file names if you need to. You just need to use slashes so that the environment can recognize the spaces as contained within the file name.
my\ file\ name.txt
If you try to use a command on a file with spaces without utilizes the slashes, it will not behave like you expect.
An example is the
mv
command. If you try the following it will try to move the files my, file, and name.txtmv my file name.txt
However, if you include the slashes, it will operate as expected.
mv my\ file\ name.txt