For our systems using AMD GPUs (Tioga, Tuo, RZAdams, El Capitan, Tenaya), we recommend you use one of our pre-built torch wheels shown here on Nexus. To avoid known bugs in older versions, you must use PyTorch >= 2.7 on systems with AMD GPUs.
Currently we offer PyTorch v2.8 wheels compatible with
- Python v3.11 including the Python modules
- `python/3.11.5`,
- `cray-python/3.11.5`,
- `cray-python/3.11.7`.
- ROCm v6.3.1
Installing PyTorch
Recommended steps:
- Load a python module for which PyTorch wheels have been built (currently python/3.11.5 only)
- Create and activate virtual environment based on this module
- Use `python3 -m venv <directory>` not `virtualenv <directory>`
- Do not use `--system-site-packages`
- Run `pip index versions --pre torch` to see available torch wheels for your python installation.
- Install one of the available pytorch wheels (of the form `<torch version>+<git hash>.rocm<rocm version>`) to your virtual environment
- Install numpy (as of 8/20/2025; upcoming torch wheels will include numpy)
module load python/3.11.5
python3 -m venv mytorchenv
source mytorchenv/bin/activate
pip index versions --pre torch # to check torch wheels
pip install torch==2.8.0a0+gitfa0fdc0.rocm631
pip install numpy
Test this worked:
To check whether PyTorch is installed and whether GPUs are visible, run the following command from the command line :
python -c 'import torch; x = torch.rand(5, 3); print(x); print("GPUs are available: ", torch.cuda.is_available())'
which is equivalent to the following in the Python REPL:
>>> import torch; x = torch.rand(5, 3) >>> print(x); print("GPUs are available: ", torch.cuda.is_available())
Using PyTorch on multiple nodes
RCCL/OFI plug-in
If you're scaling PyTorch across multiple nodes across the Cray Slingshot network, getting performance requires you to load one of the plugins located under collab/usr/global/tools/rccl. (This plugin lets RCCL use the libfabric library.) Just set LD_LIBRARY_PATH to include the folder with librccl-net.so.
For example, when using PyTorch built with ROCm v6.3, run (or add to your .bashrc):
export LD_LIBRARY_PATH=/collab/usr/global/tools/rccl/toss_4_x86_64_ib_cray/rocm-6.3.1/install/lib:$LD_LIBRARY_PATH
MPI4Py users
MPI4Py users are recommended to install one of our wheels provided here; those compatible with your python version will show with a git hash in `pip index versions --pre mpi4py` output.
For example,
pip install mpi4py==4.1.0.dev0+mpich.8.1.32
Using PyTorch from within a Jupyter notebook
Please use the docs Orbit and Jupyter notebooks to create a Jupyter kernel from your python virtual environment. In particular, after creating your virtual environment as described above, you will need to
- Install `ipykernel` to your virtual environment
- Install your custom kernel to `~/.local`
- Manually update LD_LIBRARY_PATH in `kernel.json`.
pip install ipykernel python3 -m ipykernel install --prefix=$HOME/.local --name 'mytorchenv' --display-name 'mytorchenv' echo $LD_LIBRARY_PATH
Use the output of `echo $LD_LIBRARY_PATH` to update `$HOME/.local/share/jupyter/kernels/<yourKernelName>/kernel.json` as shown in the "Custom Kernel ENV" section of Orbit and Jupyter notebooks. Your definition for "env" in kernel.json might look like this:
"env": { "LD_LIBRARY_PATH": "/collab/usr/global/tools/rccl/toss_4_x86_64_ib_cray/rocm-6.3.1/install/lib:/opt/cray/pe/lib64:/opt/cray/lib64:/opt/cray/pe/papi/7.2.0.2/lib64:/opt/cray/libfabric/2.1/lib64:${LD_LIBRARY_PATH}" },