Tuolumne | HPC @ LLNL

For a comparison of ATS-4 systems, see: Using El Capitan Systems: Hardware Overview

Tuolumne is an unclassified sibling system of El Capitan, sharing the same architecture. At greater than 288 petaflops, it not only outstrips its unclassified predecessor Lassen, it also exceeds the speed of LLNL's previous flagship petascale machine, Sierra. Tuolumne debuted at #10 on the November 2024 Top500 list of most powerful supercomputers in the world.

NOTE This system currently has limited availability.

Job Limits

Each LC platform is a shared resource. Users are expected to adhere to the following usage policies to ensure that the resources can be effectively and productively used by everyone. You can view the policies on a system itself by running:

news job.lim.MACHINENAME

Web Version of Tuolumne Job Limits

Tuolumne is an CORAL2 system. There are 8 login nodes, 44 compute nodes in a pdebug partition, and 1100 compute nodes in the pbatch partition. The compute nodes have 96 AMD EPYC cores/node, 4 AMD MI300A gpus/node, and 512 GB memory per node. Tuolumne is running TOSS 4 with Cray compilers.

Batch jobs are scheduled through FLUX.

pdebug—44 nodes 4224 cores 176 gpus interactive use only.

pbatch—1100 nodes 105600 cores 4400 gpus batch use only.

Pools Max nodes Max runtime --------------------------------------------------- pdebug 16/user 1 hour pbatch 256/job 24 hours ---------------------------------------------------
Do NOT run computationally intensive work on the login nodes. There are a limited number of login nodes which are meant primarily for editing files and launching jobs. A majority of the time when a login node is laggy, it is because a user has started up a compile on that login node.

pdebug is intended for debugging, visualization, and other inherently interactive work. Do not use pdebug to run batch jobs. Do not chain jobs to run one after the other. Individuals who misuse the pdebug queue in this or any similar manner may be denied access to running jobs in the pdebug queue.

Interactive access to a batch node is allowed while you have a batch job running on that node, and only for the purpose of monitoring your job. When logging into a batch node, be mindful of the impact your work has on the other jobs running on the node.

The queue can be found by typing "flux jobs -A" at the prompt
Zone	CZ
Vendor	HPE Cray
User-Available Nodes	Login Nodes* 8 nodes: tuolumne[1001-1004,2149-2152] Batch Nodes 1,100 Debug Nodes 44 Total Nodes 1,152
APUs	APU Architecture AMD MI300A
CPUs	CPU Architecture 4th Generation AMD EPYC Cores/Node 96 Total Cores 110,592
GPUs	GPU Architecture CDNA 3 [APU: AMD MI300A] Total GPUs 4,608 GPUs per compute node 4 GPU global memory (GiB) 512.00
Memory Total (GiB)	589,824
Peak Performance	Peak PFLOPS (CPUs) 5.310 Peak PFLOPs (GPUs) 288.920 Peak PFLOPS (CPUs+GPUs) 294.230
Clock Speed (GHz)	2.0
OS	TOSS 4
Interconnect	HPE Slingshot 11
Parallel job type	multiple nodes per job
Scheduler	Flux
Recommended location for parallel file space	/p/lustre5
Program	ASC, M&IC, Bio
Class	ATS-4, CORAL-2
Year Commissioned	2024
Compilers	See Compilers page or the El Capitan Systems user guide
Documentation	Using El Capitan Systems

Job Limits

Web Version of Tuolumne Job Limits