*Login nodes: rzvernal10, rzvernal11

Job Limits

Each LC platform is a shared resource. Users are expected to adhere to the following usage policies to ensure that the resources can be effectively and productively used by everyone. You can view the policies on a system itself by running:

news job.lim.MACHINENAME

Web Version of RZVernal

RZVernal is an CORAL2 early access system. There are 2 login nodes and 36 compute nodes in a pdebug partition. The compute nodes have 64 AMD EPYC cores/node, 4 AMD gfx90a gpus/node, and 512 GB memory per node. RZVernal is running TOSS 4.

There is 1 scheduling pool:

  • pdebug—2340 cores, 144 gpus (36 nodes)
Pool               Max nodes/job       Max runtime
   ---------------------------------------------------
   pdebug                3                4 hours
   ---------------------------------------------------

Scheduling

RZvernal jobs are scheduled using SLURM. Scheduling is not technically enforced so users are expected to monitor their own behavior and keep themselves within the current limits while following the policies written below:

  • Users will not compile on the login nodes during daytime hours.
  • Daytime hours are from 8:00am to 8:00pm Monday through Friday.
  • No production runs allowed, only development and debugging.
  • Users will avoid computationally intensive work on the login node.
  • We are all family and expect developers to play nice. However if someone's job(s) have taken over the machine:
    • Call them or send them an email.
    • Call Ines Heinz and she will call them and/or kill the job.
    • Call Ines's backup (Ellen Tarwater) and she will get the job killed.
  •    This approach will be revisited later and additional limits will be set if necessary.

The queue (sorted by user) can be found by typing "squeue -S u" at the prompt after setting the environment variable:
SQUEUE_FORMAT='%.7i %.9P %.8j %.8u %.2t %.10M %.6D %.4C %R'

Contact

Please call or send email to the LC Hotline if you have questions. | LC Hotline:  phone: 925-422-4531 | SCF email: lc-hotline@pop.llnl.gov

 

Zone
RZ
Vendor
HPE Cray
User-Available Nodes
Login Nodes*
2 nodes: rzvernal[10,11]
Batch Nodes
0
Debug Nodes
36
Total Nodes
38
CPUs
CPU Architecture
AMD Trento
Cores/Node
64
Total Cores
2,432
GPUs
GPU Architecture
AMD MI-250X
Total GPUs
152
GPUs per compute node
4
GPU peak performance (TFLOP/s double precision)
45.00
GPU global memory (GB)
128.00
Memory Total (GB)
16,384
Peak Performance
Peak TFLOPS (CPUs)
512.0
Peak TFLOPs (GPUs)
6,840.0
Peak TFLOPS (CPUs+GPUs)
6,916.00
Clock Speed (GHz)
1.9
OS
TOSS 4
Interconnect
HPE Slingshot 11
Parallel job type
multiple nodes per job
Program
ASC
Class
ATS-4/EA, CORAL-2
Password Authentication
OTP, Kerberos, ssh keys
Year Commissioned
2022
Compilers

See Compilers page

Documentation