Slurm and Moab Exercises

Exercise 1

Preparation:

Login to the workshop machine
- Workshops differ in how this is done. The instructor will go over this beforehand.
Copy the exercise files into your home directory then cd into it:
- cp -R /usr/global/docs/training/blaise/slurmmoab ~
- cd slurmmoab
List the contents of your subdirectory. You should notice the following files:

Description	Files
Simple shell script used for first exercise	Document exercise1.txt
Parallel program source code - C version	Document mpi_array.txt
Parallel program source code - Fortran version	Document mpi_array.txt
Hybrid parallel (MPI + threads) program source code - C version	Document mpithreads.txt
Multiple jobs from single batch script example - C version	Document multijob.txt
Solutions to exercises	Slurm Document slurm1.txt Document slurm1.out_.txt Document slurm2.txt Document slurm3.txt Document mpiarray.out_.txt Document slurm4.txt Document mpithreads.out_.txt Document slurm5.txt	Moab Document moab1.txt Document moab1.out_.txt Document moab2.txt Document moab3.txt Document mpiarray.out_.txt Document moab4.txt Document mpithreads.out_.txt Document moab5.txt

Review your cluster's batch configuration

Try the commands below:
- news job.lim.machine - where machine is the name of the cluster
- sinfo -s
- mjstat | head
Questions:
- Which queues are configured?
- How many nodes are there in each queue?
- What are the batch queue node and time limits?
- What states are the nodes in (alloc, idle, etc.)?

Find out which banks are available

To see which banks (accounts) are available to you on this cluster, simply issue the mshare command. Note that it displays your bank allocation and usage information also.
The mdiag -u classXX command can be used to list your banks (accounts) and also show your valid QOS options.
To view the entire bank hierarchy, use the mshare -t root command.

Create and run a job script

Using your favorite text editor (vi/vim, emacs, nedit, gedit, nano...), create a job script that does the following:
- Runs under the classXX login shell (/bin/tcsh)
- Sets a time limit of 5 minutes
- Requests 1 node
- Runs in the pReserved queue
- Writes the batch output to a file name of your choosing
- Gives your job a unique name
- Changes to your slurmmoab subdirectory
- Issues the name of the host it is running on
- Issues the jobid for this job
- Shows your path
- Runs the exercise1 executable provided to you
- Sleeps for a few minutes (so you can have time to check on it)
- For reference, you can review the appropriate solution file:
  
  Document
  
  slurm1.txt
  
  or
  
  Document
  
  moab1.txt
Submit your job using either the sbatch (Slurm) command or the msub (Moab) command.
- Was your job script accepted?
- What was its jobid number?
- Problems? Check your script against the
  
  Document
  
  slurm1.txt
  
  or
  
  Document
  
  moab1.txt
  
  solution file for errors.

Monitor your job

The tutorial described several ways to monitor your job, including:
- squeue
- mjstat
- showq
- checkjob
- mdiag -j
Try any/all of these commands, noting their similarities and differences.
Hint: you may want to pipe the output of the more verbose commands into grep with the jobid or your workshop username. For example:
- showq | grep class04
If you run out of time, you can submit another job with a longer "sleep".
If you have questions about the output of these commands, check the tutorial and/or man pages.
After your job completes, examine its status using the checkjob and showq -c | grep jobid commands.

Check your job's output

Review the output file from your job.
- Where did you find it?
- Is it named what you specified?
- Is the output what should be expected? Compare your output file to the
  
  Document
  
  slurm1.out_.txt
  
  or
  
  Document
  
  moab1.out_.txt
  
  output file.
- You may also want to look at the
  
  Document
  
  exercise1.txt
  
  executable.

This completes Exercise 1

Exercise 2

Still logged into the workshop cluster?

If so, then continue to the next step. If not, then login as you did previously for Exercise 1.

Holding and releasing a job

Using your same job script, submit the job so that it is held. This can be done on the msub or sbatch command line, or from within the script itself. Try both ways. If you have any questions see the tutorial.
Verify that your job is actually in a holding state
Release your job(s) so that they run to completion.
Verify that the job release actually took effect

Canceling a job

Once again, submit your job script.
Try to cancel it before it completes. You can do this when its queued or when it's running. If you have any questions see the tutorial.
Confirm that the job is actually cancelled. Also, check its post-execution status with the checkjob command.

Running in standby mode

Modify your job script so that it will run in standby mode. The

Document

slurm2.txt

and

Document

moab2.txt

files are provided for reference.
Submit your job script.
When your job starts to run, verify that it is running in standby mode. One way to do this is use the checkjob command and look for qos:standby near the top of the output.
Submit your job script again but be sure to have the job HELD.
Confirm that the job is held.
Now change the qos from standby to normal for this job. If you have any questions see the Standby section of the tutorial.
Confirm that the qos was changed.
Cancel the job (or release it and let it run) when you're sure that it was changed.

Run a parallel job

Using the slurm1 or moab1 example file, copy it to a new file - call it whatever you'd like.
Modify your new file so that:
- Four nodes are requested
- A new output file name is used
- It compiles either mpi_array.c (use "mpicc") or mpi_array.f (use "mpif77").
- Lists the names of the nodes used to run the job
- Runs a 48-task MPI job using the mpi_array executable you created in the previous step.
The

Document

slurm3.txt

and

Document

moab3.txt

example files are provided for reference.
Submit your job and monitor it, making sure it is using the number of nodes/tasks specified.
Check your output file to verify that things worked. See

Document

mpiarray.out_.txt

as a comparison.

Run a hybrid (MPI + threads) parallel job

The example file mpithreads.c combines MPI with pthreads. The basic idea is to run one MPI task per node, and then spawn one thread for each core on that node. The threads do the actual work and MPI is used to collect the results across all nodes. Feel free to examine the source code if you'd like.
Using the slurm3 or moab3 example file, copy it to a new file - call it whatever you'd like.
Modify your new file so that:
- A new output file name is used
- Compiles the mpithreads.c file (use "mpicc -pthread")
- Runs a 4-task MPI job using the executable you created in the previous step. However, this time run with only one task per node. This will permit the threads spawned by each MPI task to use the available cores on a node without competition from the threads of other MPI tasks.
The

Document

slurm4.txt

and

Document

moab4.txt

example files are provided for reference.
Submit your job and monitor it, making sure it is using the number of nodes/tasks specified.
Check your output file to verify that things worked. See

Document

mpithreads.out_.txt

as a comparison.

Run multiple jobs from a single batch script

The

Document

slurm5.txt

and

Document

moab5.txt

example files demonstrate how to run multiple jobs from a single batch script.
Review either example file and note what is being done:
- Four nodes are requested
- A simple executable is compiled
- Four 1-node jobs are launched to run simultaneously
Submit either example file and then review its output when it completes.

When will my job start?

Some of the most frequently asked questions by users include;
- "When will my job start?"
- "Why won't my job start?"
- "Why are jobs sitting idle when there are ample unused nodes?"
There are several common answers to these questions. Assuming that there are no system problems or errors in the user's job submission script, one of the most common reasons has to do with a job's calculated priority and the scheduler's fair-share algorithms.
Use one of the commands below to generate a list of eligible jobs and their priorities:

sprio -l  |  more
mdiag -p -v  |  more

As you scroll through the list, note that it is sorted by jobid.
To make the list more meaningful, sort it by priority (highest to lowest):

sprio -l  |  sort -r -k 3,3
mdiag -p -v  |  sort -r -k 3,3

You can now find where any job is relative to other jobs in the queue.
Columns 4-9 show the factors used to compute priority values.
You also use the checkjob jobid command to view the scheduler's current estimate on when your job will start. Look for the line that shows "StartTime:" (if it exists). For example:

% checkjob 87889
...
WallTime:  00:00:00 of 1-00:00:00
SubmitTime: Wed Jun  7 10:40:27
  (Time Queued Total: 00:01:48   Eligible: 00:01:48)

StartTime: Thu Jun  8 10:41:57
Total Requested Tasks:  1
Total Requested Nodes:  1
Partition: pbatch
Dedicated Resources Per Task: lscratchf
Node Access: SINGLEJOB
...

Sometimes the squeue --start command can be used to get an estimate for job start times. And sometimes it can't...
Note that start times can change dynamically if new jobs with a higher priority are submitted.