LSF Quick Start Guide
Note that these instructions are for submitting and running jobs on the final CORAL systems (Sierra, Lassen, rzAnsel) that use jsrun / lrun to launch job steps. The EA systems (Ray, rzManta, Shark) use mpirun for job step launch. Scroll to the end of the page for instructions on their use.
Get a dedicated compute node for running parallel compiles, debugging, etc.
$ lalloc 1
The lalloc wrapper script gets an allocation and drops the user at a shell prompt on the first compute node in that allocation. lalloc -h will give you more details on other options. In particular, note that if you wish to submit multiple job steps (jsrun / lrun) interactively, we recommend using the --shared-launch option as one failed or cancelled job step can kill the jsmd on the compute node, which will prevent you from launching more job steps from that compute node.
Submit a batch script to run one or more job steps on a compute node or nodes
$ cat tennode.bsub #!/bin/bash #BSUB -nnodes 10 #BSUB -q pbatch lrun -T4 myapp input1 $ bsub tennode.bsub
The lrun wrapper script provides a simple syntax for launching job steps. In this example, lrun -T 4 myapp ... is telling lrun to launch myapp with 4 tasks on each node in my allocation. lrun may also be launched with -n<ntasks> and/or -N<nnodes> and take jsrun options for more detailed task layout options. You may also use jsrun directly to launch job steps. See the srun vs jsrun page for more details on jsrun options.
You can also launch multiple job steps seriallaly or in parallel within a batch script. E.g.
$ cat twosteps.bsub #!/bin/bash #BSUB -nnodes 10 #BSUB -q pbatch lrun -N5 -T4 myapp input1 & lrun -N5 -T4 myapp input2 & wait $ bsub twosteps.bsub
This will get a 10 node allocation and then run myapp with input1 on 5 of those nodes and myapp with input2 on the remaining 5 nodes.
Querying the Queue
The following commands are useful for querying the queue on all LSF systems.
Get a summary of all jobs and partitions on an LSF system
See only your jobs in the queue
See all the jobs in the queue
$ bjobs -u all
List queued jobs displaying the fields that are important to you
$ man bjobs
and scroll to the "Output fields for bjobs" listed under the -o option. Then create an environment variable that contains the fields you like to see.
For example, for bash:
$ export LSB_BJOBS_FORMAT="id:- user:-8 user_group:- queue:- nexec_host:- stat: start_time: run_time: finish_time: priority: exec_host:32"
and for csh:
$ setenv LSB_BJOBS_FORMAT "id:- user:-8 user_group:- queue:- nexec_host:- stat: start_time: run_time: finish_time: priority: exec_host:32"
Now run bjobs again, but this time adding the -u all option to see all user jobs:
$ bjobs -u all JOBID USER USER_GROUP QUEUE NEXEC_HOST STAT START_TIME RUN_TIME FINISH_TIME JOB_PRIORITY EXEC_HOST 4136 arnold guests exempt 1 RUN Dec 11 16:16 2765200 second( Feb 9 16:16 L 515 20*ray44 6109 mike guests pbatch 1 RUN Jan 12 15:57 1545 second(s) Jan 12 16:27 L 512 2*ray51 6115 susan guests pbatch 1 RUN Jan 12 16:13 596 second(s) Jan 12 16:43 L 512 ray28
Display details about a specific job
$ bjobs -l <jobID>
Display the job script for one of your jobs
$ cat <jobID.out>
LSF inserts your batch script into your job's output file.
Show all the jobs you have run today
$ bhist -d
Show all the job steps that ran within a specific job
List the charge accounts you are permitted to use (bsub -G option)
Display the factors contributing to each pending job's assigned priority
Cancel a job, whether it is pending in the queue or running
$ bkill <job_ID>
Send a signal to a running job
For example, send SIGUSR1:
$ bkill -s USR1 <job_ID>
Display the queues available
Display details about all the queues
$ bqueues -l
Running Jobs (EA Systems)
The instructions below cover job launch on the CORAL EA systems (Ray, rzManta, and Shark).
Run one task of myApp on one core of a node:
$ bsub -G guests myApp Job <6086> is submitted to default queue <pbatch>.
This is the simplest way to run a job on a cluster. In this example, the lone bsub command defaults to asking for one task on one core on one node of the default queue. Note that LC policy requires a valid charge account. This can be specified using the -G option as above or specified using the LSB_DEFAULT_USERGROUP environment variable:
$ export LSB_DEFAULT_USERGROUP=guests $ bsub myApp Job <6089> is submitted to default queue <pbatch>.
Now that the LSB_DEFAULT_USERGROUP environment variable has been defined and demonstrated, the remaining examples will assume a valid LSB_DEFAULT_USERGROUP is defined.
Run hostname in an interactive allocation:
$ bsub -Is bash Job <6092> is submitted to default queue <pbatch>. <<Waiting for dispatch ...>>
blocks here until job runs
<<Starting on ray29>> < LC banner and message of the day > $ mpirun hostname ray29
Run it again
$ mpirun hostname ray29
Now exit the job and allocation.
$ exit exit
Like bsub in the first example, bsub -Is defaults to asking for one node of the default queue charging the default account (as conveyed by LSB_DEFAULT_USERGROUP). Once the job runs and the prompt appears, any further commands are run within the job's allocated resources until exit is invoked.
Create a batch job script and submit it
$ cat > myBatch.cmd #!/bin/bash #BSUB -n 32 #BSUB -R "span[ptile=8]" #BSUB -q pdebug #BSUB -G myAccount #BSUB -W 30 #BSUB -x mpirun -np 32 myApp ^D
This script asks for 32 tasks with 8 tasks per node, with exclusive use of the each node. Nodes must be from from the pdebug queue for no more than 30 minutes charging the myAccount account. The mpirun command launches 32 tasks of myApp across the four nodes.
Now submit the job:
$ bsub < myBatch.cmd Job <6116> is submitted to queue <pdebug>.
See the job pending in the queue:
$ bjobs 6116 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 6116 marco DONE pdebug ray23 8*ray8 * hostname Jan 12 16:13 8*ray5 8*ray6 8*ray7
After the job runs, the output will be found in a file named after the job id: 6116.out