The El Capitan systems are managed by the flux resource scheduler. When a set of parallel processes is started on a specific node, LC uses the MPIBind tool to map processes to the underlying hardware resources.
Transitioning from Slurm to Flux
Many users will not need to take any action to transition from slurm to flux, instead, they will be able to interact using the flux_wrappers.
By default, the flux_wrappers package is loaded when users log in. These wrappers provide "slurm-like" commands which wrap underlying flux commands. Available commands include ‘srun’, ‘sbatch’, ‘salloc’, ‘sxterm’, ‘scancel’, ‘squeue’, ‘showq’, and ‘sinfo’. You can add a ‘-v’ flag to most of these commands to see the Flux command that is being executed.
demo of flux_wrappers
# flux_wrappers provid slurm-like interface to flux commands $ which sinfo /usr/global/tools/flux_wrappers/bin/sinfo # show underlying flux command with `-v` $ sinfo -v #running : flux resource list PARTITION AVAIL TIMELIMIT NODES STATE NODELIST ...
Learn Flux
Flux is built from the ground-up for complex and fine-grained scheduling needs. It is particularly useful for regression testing or UQ style pipelines.
Flux resources:
- LC's Batch System Cross Reference Guide. This is a quick reference for translating between different batch scheduling systems.
- Check out the Flux Cheatsheet
- Convert existing slurm sbatch script to flux format using the slurm2flux utility on LC systems. This utility will make the following translations:
- #SBATCH options will become #flux options
- srun commands will become flux run commands.
- Reference the flux-core manual pages, which include extensive documentation on the Flux Python API
- Manual pages are available on all of the LC TOSS4 systems using the man command
- Flux Tutorial for LC users
- LC user meeting presentation on Current State of Flux in LC, July 26, 2022
Get Help with Flux
Our flux team is supporting users in a dedicated mattermost channel under the DOE-wide HPC mattermost team (invite link). Join the ‘flux’ channel after joining through the invite link.
MPIBind
LC uses MPIBind by default. It will try to do the "right" thing, and evenly divide up all GPUs and CPUs evenly across the number of tasks. You can disable MPIBind, but it is not recommended.
Learn MPIBind
- MPIBind for Flux users tutorial. Test programs which print out the mapping for MPI and OpenMP programs can be found in the MPIBind repo affinity directory
- MPIBind tutorial on Discovering node architecture topology
- MPIBind tutorial on Flux affinity on the AMD MI300A APU
Understanding MPIBind Bindings
The following examples use a simple test program to demonstrate how a program is launched and bound to the hardware. The simple MPI program is written in C and prints out a message from each MPI rank.
This can be compiled with
$ cc simple.c
Flux Binding Example
The key when using flux is to add the --exclusive flag. This flag ensures that the job gets access to ALL resources on the allocated nodes. Without this flag, users may find that they are only given access to a subset of the available resources.
$ flux run -n 16 --verbose --exclusive --nodes=1 --setopt=mpibind=verbose:1 ./a.out jobid: fA2p8hQE3 0.064s: flux-shell[0]: mpibind: mpibind: task 0 nths 4 gpus 4 cpus 0-3 mpibind: task 1 nths 4 gpus 4 cpus 4-7 mpibind: task 2 nths 4 gpus 5 cpus 8-11 mpibind: task 3 nths 4 gpus 5 cpus 12-15 mpibind: task 4 nths 4 gpus 2 cpus 16-19 mpibind: task 5 nths 4 gpus 2 cpus 20-23 mpibind: task 6 nths 4 gpus 3 cpus 24-27 mpibind: task 7 nths 4 gpus 3 cpus 28-31 mpibind: task 8 nths 4 gpus 6 cpus 32-35 mpibind: task 9 nths 4 gpus 6 cpus 36-39 mpibind: task 10 nths 4 gpus 7 cpus 40-43 mpibind: task 11 nths 4 gpus 7 cpus 44-47 mpibind: task 12 nths 4 gpus 0 cpus 48-51 mpibind: task 13 nths 4 gpus 0 cpus 52-55 mpibind: task 14 nths 4 gpus 1 cpus 56-59 mpibind: task 15 nths 4 gpus 1 cpus 60-63 Number of tasks= 16 My rank= 9 Number of tasks= 16 My rank= 8 Number of tasks= 16 My rank= 15 Number of tasks= 16 My rank= 14 Number of tasks= 16 My rank= 13 Number of tasks= 16 My rank= 12 Number of tasks= 16 My rank= 11 Number of tasks= 16 My rank= 10 Number of tasks= 16 My rank= 7 Number of tasks= 16 My rank= 6 Number of tasks= 16 My rank= 4 Number of tasks= 16 My rank= 1 Number of tasks= 16 My rank= 0 Number of tasks= 16 My rank= 3 Number of tasks= 16 My rank= 2 Number of tasks= 16 My rank= 5