Note The following instructions will only work on the following LC AMD systems: Tioga, Tuolumne, RZAdams, RZVernal, El Capitan, and Tenaya. Corona users, please see Corona's PyTorch Quickstart.
Quickstart
For our systems using the MI250X or MI300A, we now recommend you use the public wheels from PyTorch. A typical workflow looks like this:
$> module load python/3.13.2
$> virtualenv pytorch $> source pytorch/bin/activate $> pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm7.2
These installations can run performantly on our systems, as long as you use all of the appropriate environment variables and flux settings. An example is provided below.
Test this worked:
To check whether PyTorch is installed and whether GPUs are visible, run the following command from the command line:
python -c 'import torch ; print(torch.rand(5, 3)) ; print("Torch Version", torch.__version__) ; print("GPU available:", torch.cuda.is_available())'
On a node with GPUs, output should look something like:
tensor([[0.0796, 0.2218, 0.8005], [0.7947, 0.3835, 0.9008], [0.8714, 0.7890, 0.6630], [0.6062, 0.7453, 0.7118], [0.7487, 0.2672, 0.4115]]) Torch Version 2.12.0+rocm7.2 GPU available: True
Running the same thing in a Python REPL gives you:
>>> import torch
>>> print(torch.rand(5, 3))
tensor([[0.2399, 0.4855, 0.4793],
[0.0691, 0.5013, 0.8669],
[0.9730, 0.7977, 0.2821],
[0.1011, 0.7830, 0.1502],
[0.6469, 0.7673, 0.8410]])
>>> print("Torch Version", torch.__version__)
Torch Version: 2.12.0+rocm7.2
>>> print("GPU available:", torch.cuda.is_available())
GPU available: TrueUsing PyTorch on multiple nodes
Wrapper Script: Recommended multi-node config
To get up and running quickly, use this script to wrap your training for best performance. An example training script is provided here.
Looking to the future, the LBANN team's hpc-launcher tool will be able to wrap PyTorch's torchrun executable for launching jobs on our CORAL2 systems. The tool works currently if you aren't using PyTorch's torchrun executable.
Without the wrapper: important settings to consider
For PyTorch performance across multiple nodes, you'll want to run something like
module load rocm/7.2.1 module load rccl pip install mpi4py==4.1.0.dev0+mpich.9.1.0 # if mpi4py is needed
export LD_LIBRARY_PATH=/collab/usr/global/tools/rccl/toss_4_x86_64_ib_cray/rocm-7.2.0/install/lib:$LD_LIBRARY_PATH
These commands are for PyTorch wheels built against rocm 7.2. Spindle will be loaded for you automatically.
More details below.
Loading ROCm
To run distributed PyTorch, load the appropriate ROCm module at runtime. For example,
module load rocm/7.2.1
If everything is set up correctly, then running broadcast-ddp.py via
flux run -N 2 --tasks-per-node=4 -q debug -t 5m --exclusive python3 broadcast-ddp.py
will yield output like
flux-job: f3sLNcuLGTEX started Rank 3/8: tensor after all_reduce = 28 Rank 2/8: tensor after all_reduce = 28 Rank 1/8: tensor after all_reduce = 28 Rank 0/8: tensor after all_reduce = 28 Rank 7/8: tensor after all_reduce = 28 Rank 6/8: tensor after all_reduce = 28 Rank 5/8: tensor after all_reduce = 28 Rank 4/8: tensor after all_reduce = 28
RCCL module & RCCL-OFI plug-in
When scaling PyTorch across multiple nodes via the Cray Slingshot network, you must load
module load rccl
which defaults to module load rccl/working-env. By instead selecting module load rccl/fast-env-slows-mpi, you will gain additional RCCL performance at the cost of degraded MPI performance.
Separately, getting multi-node performance requires a plugin that lets RCCL use the libfabric library. These plugin libraries are available in /collab/usr/global/tools/rccl. Adding these to LD_LIBRARY_PATH will enable the plugin when you run PyTorch.
MPI4Py users
MPI4Py users are recommended to install one of our wheels provided here; those compatible with your python version will show with a git hash including `dev0` in `pip index versions --pre mpi4py` output.
For example,
pip install mpi4py==4.1.0.dev0+mpich.9.1.0
Spindle
For multi-node jobs, LC highly recommends using Spindle to accelerate Python library loading. Spindle is already on by default for El Capitan and Tuo, but needs to be manually set for other systems, including RZAdams, RZVernal, and Tioga.
Using PyTorch from within a Jupyter notebook
Please use the docs Orbit and Jupyter notebooks to create a Jupyter kernel from your python virtual environment. In particular, after creating your virtual environment as described above, you will need to
- Install `ipykernel` to your virtual environment
- Install your custom kernel to `~/.local`
- Manually update LD_LIBRARY_PATH in `kernel.json`.
pip install ipykernel python -m ipykernel install --prefix=$HOME/.local --name 'mytorchenv' --display-name 'mytorchenv' echo $LD_LIBRARY_PATH
Use the output of `echo $LD_LIBRARY_PATH` to update `$HOME/.local/share/jupyter/kernels/<yourKernelName>/kernel.json` as shown in the "Custom Kernel ENV" section of Orbit and Jupyter notebooks. Your definition for "env" in kernel.json might look like this:
"env": {
"LD_LIBRARY_PATH": "/opt/cray/pe/lib64:/opt/cray/lib64:/opt/cray/pe/papi/7.2.0.2/lib64:/opt/cray/libfabric/2.1/lib64:${LD_LIBRARY_PATH}"
},
