PyTorch on AMD GPU Systems Quickstart Guide

Note The following instructions will only work on LC AMD systems: Tioga, Tuolumne, RZAdams, RZVernal, El Capitan, and Tenaya.

For our systems using AMD GPUs, we recommend you use one of our pre-built torch wheels shown here on Nexus. To avoid known bugs in older versions, you must use PyTorch >= 2.7 on systems with AMD GPUs.

Please refer to this version matrix for PyTorch installation commands:

Python Module	PyTorch Version	ROCm Version	Installation Command
python/3.11.5 cray-python/3.11.5 cray-python/3.11.7	2.8.0	6.3.1	pip install torch==2.8.0+rocm631
cray-python/3.11.5 cray-python/3.11.7	2.8.0	6.4.2	pip install==2.8.0+rocm642

This version matrix includes recommended and stable versions of PyTorch. Nightly builds of PyTorch 2.10 on Nexus are named with the convention <date>+rocm<version>, for example 2025.9.23+rocm642. In addition to searching Nexus, you can discover torch wheels via pip index versions --pre torch; please only consider LC's builds, whose names contain the +rocm<version> notation.

Installing PyTorch

Recommended steps:

Load the python/3.11.5 module, for which PyTorch wheels have been built
Create and activate virtual environment based on this module
1. Use `python3 -m venv <directory>`
2. Do not use `--system-site-packages`
Install one of the available PyTorch wheels listed above (of the form `<torch version>+rocm<rocm version>`) to your virtual environment

 module load python/3.11.5
 python3 -m venv mytorchenv
 source mytorchenv/bin/activate
 pip install torch==2.8.0+rocm631

Test this worked:

To check whether PyTorch is installed and whether GPUs are visible, run the following command from the command line :

python -c 'import torch ; print(torch.rand(5, 3)) ; print("Torch Version", torch.__version__) ; print("GPU available:", torch.cuda.is_available())'

which is equivalent to the following in the Python REPL:

>>> import torch; print(torch.rand(5, 3)); print("Torch Version", torch.__version__) ; print("GPU available:", torch.cuda.is_available())

On a node with GPUs, output should look something like:

tensor([[0.0796, 0.2218, 0.8005],
        [0.7947, 0.3835, 0.9008],
        [0.8714, 0.7890, 0.6630],
        [0.6062, 0.7453, 0.7118],
        [0.7487, 0.2672, 0.4115]])
Torch Version 2.8.0+rocm631
GPU available: True

Using PyTorch on multiple nodes

Recommended steps

For PyTorch performance across multiple nodes,

module load rocm/6.3.1 # for Torch wheel built against ROCm v6.3.1 
module load rccl
pip install mpi4py==4.1.0.dev0+mpich.8.1.32 # if mpi4py is needed

in addition to the steps above. Spindle and the RCCL-OFI plug-in will be loaded for you.

More details below.

Loading ROCm

To run distributed PyTorch, load the appropriate ROCm module at runtime. For example,

module load rocm/6.3.1

If everything is set up correctly, then running broadcast-ddp.py via

flux run -N 2 --tasks-per-node=4 python3 broadcast-ddp.py

will yield output like

Rank 3/8: tensor after all_reduce = 28
Rank 2/8: tensor after all_reduce = 28
Rank 1/8: tensor after all_reduce = 28
Rank 0/8: tensor after all_reduce = 28
Rank 7/8: tensor after all_reduce = 28
Rank 6/8: tensor after all_reduce = 28
Rank 5/8: tensor after all_reduce = 28
Rank 4/8: tensor after all_reduce = 28

RCCL module & RCCL-OFI plug-in

When scaling PyTorch across multiple nodes via the Cray Slingshot network, you must load

module load rccl

which defaults to module load rccl/working-env. By instead selecting module load rccl/fast-env-slows-mpi, you will gain additional RCCL performance at the cost of degraded MPI performance.

Separately, getting multi-node performance requires a plugin that lets RCCL use the libfabric library. If you are using one of the recommended PyTorch wheels on Nexus, no action is required: the plugin will be used by default.

Otherwise, versions of this plugin are located under /collab/usr/global/tools/rccl.

MPI4Py users

MPI4Py users are recommended to install one of our wheels provided here; those compatible with your python version will show with a git hash including `dev0` in `pip index versions --pre mpi4py` output.

For example,

pip install mpi4py==4.1.0.dev0+mpich.8.1.32

Spindle

For multi-node jobs, LC highly recommends using Spindle to accelerate Python library loading. Spindle is already on by default for El Capitan and Tuo, but needs to be manually set for other systems, including RZAdams, RZVernal, and Tioga.

Using PyTorch from within a Jupyter notebook

Please use the docs Orbit and Jupyter notebooks to create a Jupyter kernel from your python virtual environment. In particular, after creating your virtual environment as described above, you will need to

Install `ipykernel` to your virtual environment
Install your custom kernel to `~/.local`
Manually update LD_LIBRARY_PATH in `kernel.json`.

pip install ipykernel
python -m ipykernel install --prefix=$HOME/.local --name 'mytorchenv' --display-name 'mytorchenv'
echo $LD_LIBRARY_PATH

Use the output of `echo $LD_LIBRARY_PATH` to update `$HOME/.local/share/jupyter/kernels/<yourKernelName>/kernel.json` as shown in the "Custom Kernel ENV" section of Orbit and Jupyter notebooks. Your definition for "env" in kernel.json might look like this:

  "env": {
  "LD_LIBRARY_PATH": "/opt/cray/pe/lib64:/opt/cray/lib64:/opt/cray/pe/papi/7.2.0.2/lib64:/opt/cray/libfabric/2.1/lib64:${LD_LIBRARY_PATH}"
},