LSF Quick Start Guide

Running Jobs

IMPORTANT: all of the CORAL systems currently use ssh for task launch. This means that you need to have passwordless ssh keys set up in order to run successfully. Instructions for setting up ssh keys can be found on this confluence page or by searching the web.

Note that these instructions are for submitting and running jobs on the final CORAL systems (Sierra, Lassen, rzAnsel) that use jsrun / lrun to launch job steps. The EA systems (Ray, rzManta, Shark) use mpirun for job step launch. Scroll to the end of the page for instructions on their use.

Get a dedicated compute node for running parallel compiles, debugging, etc.

$ lalloc 1

The lalloc wrapper script gets an allocation and drops the user at a shell prompt on the first compute node in that allocation. lalloc -h will give you more details on other options. In particular, note that if you wish to submit multiple job steps (jsrun / lrun) interactively, we recommend using the --shared-launch option as one failed or cancelled job step can kill the jsmd on the compute node, which will prevent you from launching more job steps from that compute node.

Submit a batch script to run one or more job steps on a compute node or nodes

$ cat tennode.bsub
#BSUB -nnodes 10
#BSUB -q pbatch

lrun -T4 myapp input1

$ bsub tennode.bsub

The lrun wrapper script provides a simple syntax for launching job steps. In this example, lrun -T 4 myapp ... is telling lrun to launch myapp with 4 tasks on each node in my allocation. lrun may also be launched with -n<ntasks> and/or -N<nnodes> and take jsrun options for more detailed task layout options. You may also use jsrun directly to launch job steps. See the srun vs jsrun page for more details on jsrun options.

You can also launch multiple job steps seriallaly or in parallel within a batch script. E.g.

$ cat twosteps.bsub
#BSUB -nnodes 10
#BSUB -q pbatch

lrun -N5 -T4 myapp input1 &
lrun -N5 -T4 myapp input2 &

$ bsub twosteps.bsub

This will get a 10 node allocation and then run myapp with input1 on 5 of those nodes and myapp with input2 on the remaining 5 nodes.


Querying the Queue

The following commands are useful for querying the queue on all LSF systems.

Get a summary of all jobs and partitions on an LSF system

$ lsfjobs

See only your jobs in the queue

$ bjobs

See all the jobs in the queue

$ bjobs -u all

List queued jobs displaying the fields that are important to you

$ man bjobs

and scroll to the "Output fields for bjobs" listed under the -o option.  Then create an environment variable that contains the fields you like to see.

For example, for bash:

$ export LSB_BJOBS_FORMAT="id:- user:-8 user_group:- queue:- nexec_host:- stat: start_time: run_time: finish_time: priority: exec_host:32"

and for csh:

$ setenv LSB_BJOBS_FORMAT "id:- user:-8 user_group:- queue:- nexec_host:- stat: start_time: run_time: finish_time: priority: exec_host:32"

Now run bjobs again, but this time adding the -u all option to see all user jobs:

$ bjobs -u all
   4136   arnold          guests     exempt          1 RUN   Dec 11 16:16 2765200 second( Feb  9 16:16 L 515          20*ray44                        
   6109     mike          guests     pbatch          1 RUN   Jan 12 15:57 1545 second(s)  Jan 12 16:27 L 512          2*ray51                         
   6115    susan          guests     pbatch          1 RUN   Jan 12 16:13 596 second(s)   Jan 12 16:43 L 512          ray28

Display details about a specific job

$ bjobs -l <jobID>

Display the job script for one of your jobs

$ cat <jobID.out>

LSF inserts your batch script into your job's output file.

Show all the jobs you have run today

$ bhist -d

Show all the job steps that ran within a specific job


List the charge accounts you are permitted to use (bsub -G option)


Display the factors contributing to each pending job's assigned priority


Cancel a job, whether it is pending in the queue or running

$ bkill <job_ID>

Send a signal to a running job

For example, send SIGUSR1:

$ bkill -s USR1 <job_ID>

Display the queues available

$ bqueues

Display details about all the queues

$ bqueues -l


Running Jobs (EA Systems)

The instructions below cover job launch on the CORAL EA systems (Ray, rzManta, and Shark).

Run one task of myApp on one core of a node:

$ bsub -G guests myApp
Job <6086> is submitted to default queue <pbatch>.

This is the simplest way to run a job on a cluster.  In this example, the lone bsub command defaults to asking for one task on one core on one node of the default queue.  Note that LC policy requires a valid charge account.  This can be specified using the -G option as above or specified using the LSB_DEFAULT_USERGROUP environment variable:

$ bsub myApp
Job <6089> is submitted to default queue <pbatch>.

Now that the LSB_DEFAULT_USERGROUP environment variable has been defined and demonstrated, the remaining examples will assume a valid LSB_DEFAULT_USERGROUP is defined.

Run hostname in an interactive allocation:

$ bsub -Is bash
Job <6092> is submitted to default queue <pbatch>.
<<Waiting for dispatch ...>>

blocks here until job runs

<<Starting on ray29>>
< LC banner and message of the day >

$ mpirun hostname

Run it again

$ mpirun hostname

Now exit the job and allocation.

$ exit

Like bsub in the first example, bsub -Is defaults to asking for one node of the default queue charging the default account (as conveyed by LSB_DEFAULT_USERGROUP).  Once the job runs and the prompt appears, any further commands are run within the job's allocated resources until exit is invoked.

Create a batch job script and submit it

$ cat > myBatch.cmd
#BSUB -n 32
#BSUB -R "span[ptile=8]"
#BSUB -q pdebug
#BSUB -G myAccount
#BSUB -W 30
#BSUB -x

mpirun -np 32 myApp

This script asks for 32 tasks with 8 tasks per node, with exclusive use of the each node.  Nodes must be from from the pdebug queue for no more than 30 minutes charging the myAccount account.  The mpirun command launches 32 tasks of myApp across the four nodes.

Now submit the job:

$ bsub < myBatch.cmd
Job <6116> is submitted to queue <pdebug>.

See the job pending in the queue:

$ bjobs 6116
6116    marco   DONE  pdebug     ray23       8*ray8      * hostname Jan 12 16:13

After the job runs, the output will be found in a file named after the job id:  6116.out