Job Limits

Each LC platform is a shared resource. Users are expected to adhere to the following usage policies to ensure that the resources can be effectively and productively used by everyone. You can view the policies on a system itself by running:

news job.lim.rzwhippet

Web Version of RZWhippet

There are two login nodes, 28 pdebug, and 8 phighmem nodes (no batch nodes). Each pdebug node is based on Intel Sapphire Rapids processor with 56 cores per socket, 2 sockets per node, and 256 GB DDR5 memory. Each phighmem node is based on Intel Sapphire Rapids processor with 56 cores per socket, 2 sockets per node, and 128 GB HBM memory.

There are 2 scheduling pools:

  • pdebug—3136 cores (28 nodes)
  • phighmem—896 cores (8 nodes)

Scheduling

RZWhippet jobs are scheduled using SLURM, Jobs are scheduled per core. Scheduling is not technically enforced so users are expected to monitor their own behavior and keep themselves within the current limits while following the policies:

  • Users can only use up to one node of phighmem at a time
  • A user can have a maximum of 336 processors with a runtime of up to 4 hours in queue during the day with the following exceptions:
    • Users can run any standby jobs since they are preemptable
    • An occasional one hour max job for debugging that takes 337-560 processors as long as it is the user's only job in the queue.
  • Daytime is 0800-2000 Mondays-Fridays not including holidays
  • No production runs allowed, only development and debugging
  • Users won't run computationally intensive work on the login node

We are all family and expect developers to play nice. However if someone's job(s) have taken over the machine:

  • Call them or send them an email.
  • Call Ines Heinz at 3-7900 and she will call them and/or kill the job.
  • Call Ines's backup (Ellen at 2-4691)and she will get the job killed.

This approach will be revisited later and additional limits will be set if necessary. If someone monopolizes the machine, developers can always shift to other RZ resources.

Documentation

Contact

Please call or send email to the LC Hotline if you have questions. LC Hotline | phone: 925-422-4531 | email: lc-hotline@llnl.gov

Zone
RZ
Vendor
Dell
User-Available Nodes
Login Nodes*
2
Batch Nodes
8
Debug Nodes
28
Total Nodes
36
CPUs
CPU Architecture
Intel(R) Xeon(R) Platinum 8479, Intel(R) Xeon(R) CPU Max 9480
Cores/Node
112
Total Cores
4,592
Memory Total (GB)
10,496
CPU Memory/Node (GB)
256
Peak Performance
Peak TFLOPS (CPUs)
293.9
Clock Speed (GHz)
2.0
OS
TOSS 4
Interconnect
Cornelis
Parallel job type
Serial
Recommended location for parallel file space
Program
ASC
Class
CTS-2
Password Authentication
OTP
Year Commissioned
2022
Compilers

See Compilers page