MPI Support Roadmap

Currently, only Cray-MPICH is supported on El Capitan systems. However, once MVAPICH and OpenMPI for Slingshot become available, they will be tested and added as alternative MPIs.

MPI Module

There is a cray-mpich module is available for every supported compiler, including:

  • gcc
  • cce
  • rocmcc

Users must first select their compiler, then run module load cray-mpich and latest version of MPI for the already loaded compiler will be selected.

Some compilers may have multiple versions of the cray-mpich library available. If you need to specify a particular MPI version:

  1. Load a compiler version, for example: module load cce/18.0.0-magic
  2. View the available cray-mpich versions: module avail cray-mpich
  3. Load the specific cray-mpich version, for example: module load cray-mpich-abi/8.1.30

GPU-Aware MPI

Cray MPICH is also GPU-aware. In order to leverage accelerated GPU transports, the following steps are necessary:

  1. Load a ROCm compiler of your choice, or use the default with the command module load rocmcc
  2. The Cray MPICH module will be automatically loaded.
  3. Include the appropriate GTL library, i.e.,
    LIBS="$PE_MPICH_GTL_DIR_amd_gfx942 $PE_MPICH_GTL_LIBS_amd_gfx942"
    LDFLAGS="-Wl,-rpath,${PE_MPICH_GTL_DIR_amd_gfx942:2}"
  4. When running your application, remember to export/set the environment variable
    MPICH_GPU_SUPPORT_ENABLED=1
    Without the above environment variable program will abort with :
    process_vm_readv: Bad address 

XPMEM

Similarly to some of the accelerated on-node GPU transports, XPMEM provides a kernel-bypass method of transferring large messages within a node. You can compile your code to use XPMEM by simply adding

-lxpmem

to your library flags (the library is already available from TOSS in /usr), and running with the environment variable: MPICH_SMP_SINGLE_COPY_MODE=XPMEM set.

Version Compatibility

HPE provides a very narrow statement of compatibility with regards to which versions of cce, rocm, and cray-mpich are guaranteed to function and be performant. Please see the known issues page for the officially supported versions and details on how version incompatibility manifests.

External Resources

HPE's CPE Documentation on Cray MPICH is available here.