Job Limits

Each LC platform is a shared resource. Users are expected to adhere to the following usage policies to ensure that the resources can be effectively and productively used by everyone. You can view the policies on a system itself by running:

news job.lim.rzadams

Web Version of RZAdams Job Limits

There are two login nodes, 126 pdebug, and no batch nodes. Each pdebug node is based on an AMD processor with 24 cores and 1 GPU per socket, 4 sockets per node, and 512G memory.

There are 3 scheduling pools:

  • pdebug  - 2304 cores (24 nodes), max 4 nodes/job, 4hrs
  • pdev*   - 9792 cores (102 nodes), max 12 nodes/job, 4hrs
  • plarge* - 9792 cores (102 nodes), max 6hrs

*Only one of pdev and plarge will be active at any time. pdev will be active from 2AM-8PM weekdays. plarge will be active otherwise.

Scheduling

RZadams jobs are scheduled using Flux, Jobs are scheduled per node. Scheduling is not technically enforced so users are expected to monitor their own behavior and keep themselves within the current limits while following the policies:

  • Users will not compile on the login nodes during daytime hours
  • A user can have a maximum of 12 pdev and pdebug nodes with a runtime of up to 4 hours in queue during the day with the following exceptions:
    • An occasional one hour max job for debugging that takes 15 nodes as long as it is the user's only job in the queue.
  • Daytime is 0800-2000 Mondays-Fridays not including holidays
  • Production runs should use the plarge queue.
  • Users won't run computationally intensive work on the login node

We are all family and expect developers to play nice. However if someone's job(s) have taken over the machine

  • Call them or send them an email
  • Email ramblings-help@llnl.gov with a screenshot so we can take care of the situation by killing work that violates policy.

This approach will be revisited later and additional limits will be set if necessary. If someone monopolizes the machine, developers can always shift to other RZ resources.

Scratch Disk Space: Consult RZ File Systems Web Page

Please contact the LC Hotline if you have any questions.

Zone
RZ
Vendor
HPE Cray
User-Available Nodes
Login Nodes*
2
Batch Nodes
102
Debug Nodes
24
Total Nodes
128
APUs
APU Architecture
AMD MI300A
CPUs
CPU Architecture
4th Generation AMD EPYC
Cores/Node
96
Total Cores
12,288
GPUs
GPU Architecture
CDNA 3
Total GPUs
512
GPUs per compute node
4
GPU peak performance (TFLOP/s double precision)
62.00
GPU global memory (GB)
128.00
Memory Total (GB)
65,536
CPU Memory/Node (GB)
512
Peak Performance
Peak TFLOPS (CPUs)
358.4
Peak TFLOPs (GPUs)
31,744.0
Peak TFLOPS (CPUs+GPUs)
32,102.40
Clock Speed (GHz)
3.7
OS
TOSS
Interconnect
32102.4
Parallel job type
32102.4
Recommended location for parallel file space
Program
ASC
Class
ATS-4
Password Authentication
32102.4
Compilers

See Compilers page