Running Jobs
Get a dedicated compute node for running parallel compiles, debugging, etc.
$ lalloc 1
The lalloc wrapper script gets an allocation and drops the user at a shell prompt on the first compute node in that allocation. lalloc -h will give you more details on other options. In particular, note that if you wish to submit multiple job steps (jsrun / lrun) interactively, we recommend using the --shared-launch option as one failed or cancelled job step can kill the jsmd on the compute node, which will prevent you from launching more job steps from that compute node.
Submit a batch script to run one or more job steps on a compute node or nodes
$ cat tennode.bsub #!/bin/bash #BSUB -nnodes 10 #BSUB -q pbatch lrun -T4 myapp input1 $ bsub tennode.bsub
The lrun wrapper script provides a simple syntax for launching job steps. In this example, lrun -T 4 myapp ... is telling lrun to launch myapp with 4 tasks on each node in my allocation. lrun may also be launched with -n<ntasks> and/or -N<nnodes> and take jsrun options for more detailed task layout options. You may also use jsrun directly to launch job steps. See the srun vs jsrun page for more details on jsrun options.
You can also launch multiple job steps serially or in parallel within a batch script. E.g.
$ cat twosteps.bsub #!/bin/bash #BSUB -nnodes 10 #BSUB -q pbatch lrun -N5 -T4 myapp input1 & lrun -N5 -T4 myapp input2 & wait $ bsub twosteps.bsub
This will get a 10 node allocation and then run myapp with input1 on 5 of those nodes and myapp with input2 on the remaining 5 nodes.
Querying the Queue
The following commands are useful for querying the queue on all LSF systems.
Get a summary of all jobs and partitions on an LSF system
$ lsfjobs
See only your jobs in the queue
$ bquery
See all the jobs in the queue
$ bquery -u all
List queued jobs displaying the fields that are important to you
$ man bquery
and scroll to the "Output fields for bquery" listed under the -o option. Then create an environment variable that contains the fields you like to see.
For example, for bash:
$ export LSB_BQUERY_FORMAT="id:- user:-8 user_group:- queue:- nexec_host:- stat: start_time: run_time: finish_time: priority: exec_host:32"
and for csh:
$ setenv LSB_BQUERY_FORMAT "id:- user:-8 user_group:- queue:- nexec_host:- stat: start_time: run_time: finish_time: priority: exec_host:32"
Now run bquery again, but this time adding the -u all option to see all user jobs:
$ bquery -u all JOBID USER USER_GROUP QUEUE NEXEC_HOST STAT START_TIME RUN_TIME FINISH_TIME JOB_PRIORITY EXEC_HOST 4136 arnold guests exempt 1 RUN Dec 11 16:16 2765200 second( Feb 9 16:16 L 515 20*ray44 6109 mike guests pbatch 1 RUN Jan 12 15:57 1545 second(s) Jan 12 16:27 L 512 2*ray51 6115 susan guests pbatch 1 RUN Jan 12 16:13 596 second(s) Jan 12 16:43 L 512 ray28
Display details about a specific job
$ bquery -l jobid
Display the job script for one of your jobs
$ cat jobid.out
LSF inserts your batch script into your job's output file.
Show all the jobs you have run today
$ bhist -d
List the charge accounts you are permitted to use (bsub -G option)
$ lshare -u username
Display the factors contributing to each pending job's assigned priority
$ bquery -prio jobid
Cancel a job, whether it is pending in the queue or running
$ bkill jobid
Send a signal to a running job
For example, send SIGUSR1:
$ bkill -s USR1 jobid
Display the queues available
$ bqueues
Display details about all the queues
$ bqueues -l