Image
Catalyst

The 150 teraflop/s Catalyst, a unique high-performance computing (HPC) cluster, serves as a proving ground for new HPC and big data technologies, architectures, and applications. Developed by a partnership of Cray, Intel, and Lawrence Livermore, this Cray CS300 system is available for collaborative projects with industry through Livermore’s HPC Innovation Center. Catalyst also supports the LLNL’s Advanced Simulation and Computing program.

Catalyst features include 128 gigabytes of dynamic random access memory (DRAM) per node and 800 gigabytes of non-volatile memory (NVRAM) per compute node. The increased storage capacity of the system represents a major departure from classic simulation-based computing architectures common at Department of Energy laboratories and enables researchers to explore the potential of combining floating point-focused capability with data analysis in one environment. In addition, the machine’s expanded DRAM and fast, persistent NVRAM are particularly well suited to solving big data problems, such as those found in the areas of bioinformatics, graph networks, machine learning, and natural language processing, or for exploring new approaches to application checkpointing, in-situ visualization, out-of-core algorithms and data analytics. Catalyst should help extend the range of possibilities for the processing, analysis, and management of the ever larger and more complex data sets that many areas of business and science now confront.

Catalyst is limited access.

*2 nodes: catalyst[159,160]

Notes:

  • Local NVRAM storage, mounted on each compute node as /l/ssd: 800 GB

Job Limits

Each LC platform is a shared resource. Users are expected to adhere to the following usage policies to ensure that the resources can be effectively and productively used by everyone. You can view the policies on a system itself by running:

news job.lim.MACHINENAME

Web version of Catalyst Job Limits

Hardware

There are 302 compute nodes, each with 128 GB of memory.  All compute nodes have Intel Xeon E5-2695 v2 processors with 24 cores/node. The nodes are connected via InfiniBand QDR (QLogic)

Scheduling

Catalyst jobs are scheduled through SLURM.

Jobs are scheduled per node. There is 1 node scheduled pool:

pbatch - 300 nodes (7,200 cores), only batch use permitted.

     Catalyst limits:

                                 Max nodes/job   Max runtime
      -------------------------------------------------------
      pbatch                         *            24 hours
      -------------------------------------------------------

*  There are no limits currently set for the maximum nodes/job, however in general LC prefers users to adhere to a good neighbor policy of limiting the use of a system or queue to 50% of the available nodes.  In this case, limiting maximum job size to 150 nodes, or limiting the aggregate number of
nodes across all running jobs to 150 nodes is a good practice.

Documentation

Scratch Disk Space

Consult CZ File Systems Web Page: https://lc.llnl.gov/fsstatus/fsstatus.cgi

Zone
CZ
Vendor
Cray
User-Available Nodes
Login Nodes*
2
Batch Nodes
300
Debug Nodes
4
Total Nodes
324
CPUs
CPU Architecture
Intel Xeon E5-2695 v2
Cores/Node
24
Total Cores
7,776
Memory Total (GB)
41,472
CPU Memory/Node (GB)
128
Peak Performance
Peak TFLOPS (CPUs)
149.3
Clock Speed (GHz)
2.4
Peak single CPU memory bandwidth (GB/s)
60
OS
TOSS 3
Interconnect
IB QDR
Parallel job type
multiple nodes per job
Run Command
srun
Recommended location for parallel file space
Program
ASC, M&IC
Password Authentication
OTP
Compilers
Documentation