Tuolumne

Tuolumne is an unclassified sibling system of El Capitan, sharing the same architecture. At greater than 288 petaflops, it not only outstrips its unclassified predecessor Lassen, it also exceeds the speed of LLNL's previous flagship petascale machine, Sierra. Tuolumne debuted at #10 on the November 2024 Top500 list of most powerful supercomputers in the world.

NOTE This system currently has limited availability. 

Job Limits

Each LC platform is a shared resource. Users are expected to adhere to the following usage policies to ensure that the resources can be effectively and productively used by everyone. You can view the policies on a system itself by running:

news job.lim.MACHINENAME

Web Version of Tuolumne Job Limits

Tuolumne is an CORAL2 system. There are 8 login nodes, 44 compute nodes in a pdebug partition, and 1100 compute nodes in the pbatch partition. The compute nodes have 96 AMD EPYC cores/node, 4 AMD MI300A gpus/node, and 512 GB memory per node. Tuolumne is running TOSS 4 with Cray compilers.

Batch jobs are scheduled through FLUX.

  • pdebug—44 nodes 4224 cores 176 gpus interactive use only.
  • pbatch—1100 nodes 105600 cores 4400 gpus batch use only.
Pools               Max nodes           Max runtime
---------------------------------------------------
pdebug              16/user               1 hour
pbatch              256/job              12 hours
---------------------------------------------------

Do NOT run computationally intensive work on the login nodes. There are a limited number of login nodes which are meant primarily for editing files and launching jobs. A majority of the time when a login node is laggy, it is because a user has started up a compile on that login node.

pdebug is intended for debugging, visualization, and other inherently interactive work. Do not use pdebug to run batch jobs.  Do not chain jobs to run one after the other. Individuals who misuse the pdebug queue in this or any similar manner may be denied access to running jobs in the pdebug queue.

Interactive access to a batch node is allowed while you have a batch job running on that node, and only for the purpose of monitoring your job. When logging into a batch node, be mindful of the impact your work has on the other jobs running on the node.

The queue can be found by typing "flux jobs -A" at the prompt

Zone
CZ
Vendor
HPE Cray
User-Available Nodes
Login Nodes*
8 nodes: tuolumne[1001-1004,2149-2152]
Batch Nodes
1,100
Debug Nodes
44
Total Nodes
1,152
APUs
APU Architecture
AMD MI300A
CPUs
CPU Architecture
4th Generation AMD EPYC
Cores/Node
96
Total Cores
110,592
GPUs
GPU Architecture
CDNA 3
Total GPUs
4,608
GPUs per compute node
4
GPU peak performance (TFLOP/s double precision)
68.00
GPU global memory (GB)
512.00
Memory Total (GB)
589,824
Peak Performance
Peak TFLOPS (CPUs)
5,308.4
Peak TFLOPs (GPUs)
288,921.6
Peak TFLOPS (CPUs+GPUs)
294,230.00
Clock Speed (GHz)
2.0
OS
TOSS 4
Interconnect
HPE Slingshot 11
Parallel job type
multiple nodes per job
Recommended location for parallel file space
Program
ASC, M&IC, Bio
Class
ATS-4, CORAL-2
Password Authentication
OTP, Kerberos, ssh keys
Year Commissioned
2024
Compilers
Documentation