LLNL COVID-19 HPC Resource Guide for New Livermore Computing Users
Lawrence Livermore National Laboratory (LLNL) is home to Livermore Computing (LC), the division responsible for deploying and operating a large number of supercomputers for laboratory scientists and their collaborators. LC operates 21 supercomputers on three networks. These systems mount multiple petabyte parallel file systems for storing user data. One of the computers that will be used for this research is Lassen. It is 23-petaflops advanced technology (ATS) system, currently the 10th fastest system in the world. It is a GPU-enabled system with over 3,100 NVIDIA Volta GPUs. Another, Quartz, is our largest commodity linux cluster with over 3,000 nodes and is 84th fastest in the world.
This page is mainly for researchers who are part of the COVID-19 HPC Consortium.
The process to gain access to LC HPC resources is outlined below.
Step 1: OUN (Official User Name) and LC User Name
You as collaborator will be assigned an LLNL sponsor who will guide you through the process of getting an LLNL Official User Name (OUN) and a Livermore Computing (LC) User Name for use on LC systems. Foreign Nationals requesting access to LC systems will require the additional step of their sponsor submitting a Visitor Tracking System (VTS) Plan prior to account approval and provisioning.
Step 2: RSA Token
You will be sent an RSA SecurID token (one-time password generator) which once activated will be used to login to LC systems via ssh. LC has hotline staff available to help you with learning this process.
Step 3: Getting Started on the Systems
Once on the LC system you will need to transfer your code/data into your home directory (24GB quota) or to either a lustre or gpfs parallel file system directory (20TB initial quota) via scp, sftp, or ftp from your local computer to the LC system. LC has a large suite of HPC codes already installed. If a copy of your code already exists we will arrange access so you will only need to transfer input files.
Step 4: Compiling and Running Your Code
LC staff will be available to help you compile your code to run on LC systems. Once compiled you will submit jobs via a batch script to the batch scheduler running on that system. Output should be written to your /p/gpfs1 or /p/lustre directory. If you have a container-based application, LC is in the process of installing infrastructure for Singularity containers.
Each LC computational computer has access to one or more high-performance parallel file systems. Lassen uses IBM's Spectrum Scale (aka GPFS) file system while other systems use Lustre. These file systems are shared between all compute nodes in a cluster as well as between other clusters, allowing work to be done on multiple systems without moving your data. Lustre file systems are mounted as /p/lustre1 and /p/lustre2, while GPFS is mounted as /p/gpfs1. These file systems are not purged, meaning files are not automatically deleted, but quotas are in place to manage available capacity. The basic tier of storage provides 20 TB of storage and up to 1 million files. These quotas may be increased by a request to the LC Hotline.
For long term data retention, the Livermore Computing center offers an archival data storage service that writes data to tape. The fiscal year quota per user is 300TB. The archive is accessed from data transfer cluster known as OSLIC. The data transfer cluster has access to the parallel file systems noted above.
LC has staff and tools available to visualize your data if necessary. Results can be sent to your home site by using ftp/sftp/scp.
Throughout this process, you will have a sponsor and designated member of staff to help you. In addition, LC Hotline staff will be available to assist with the full resources of the Livermore Computing Division of experts. They can be reached at firstname.lastname@example.org