For our systems using AMD GPUs (Tioga, Tuo, RZAdams, El Capitan, Tenaya), we recommend you use one of our pre-built torch wheels shown here on Nexus. To avoid known bugs in older versions, you must use PyTorch >= 2.7 on systems with AMD GPUs

 

Currently we offer PyTorch v2.8 wheels compatible with

  • Python v3.11 including the Python modules
    • `python/3.11.5`, 
    • `cray-python/3.11.5`,  
    • `cray-python/3.11.7`.
  • ROCm v6.3.1

Installing PyTorch

Recommended steps:

  1. Load a python module for which PyTorch wheels have been built (currently python/3.11.5 only)
  2. Create and activate virtual environment based on this module
    1. Use `python3 -m venv <directory>` not `virtualenv <directory>`
    2. Do not use `--system-site-packages`
  3. Run `pip index versions --pre torch` to see available torch wheels for your python installation.
  4. Install one of the available pytorch wheels (of the form `<torch version>+<git hash>.rocm<rocm version>`) to your virtual environment
  5. Install numpy (as of 8/20/2025; upcoming torch wheels will include numpy)
 module load python/3.11.5
 python3 -m venv mytorchenv
 source mytorchenv/bin/activate
 pip index versions --pre torch # to check torch wheels
 pip install torch==2.8.0a0+gitfa0fdc0.rocm631
 pip install numpy

Test this worked:

To check whether PyTorch is installed and whether GPUs are visible, run the following command from the command line :

python -c 'import torch; x = torch.rand(5, 3); print(x); print("GPUs are available: ", torch.cuda.is_available())'

which is equivalent to the following in the Python REPL:

>>> import torch; x = torch.rand(5, 3)
>>> print(x); print("GPUs are available: ", torch.cuda.is_available())

Using PyTorch on multiple nodes

RCCL/OFI plug-in

If you're scaling PyTorch across multiple nodes across the Cray Slingshot network, getting performance requires you to load one of the plugins located under collab/usr/global/tools/rccl. (This plugin lets RCCL use the libfabric library.) Just set LD_LIBRARY_PATH to include the folder with librccl-net.so.

For example, when using PyTorch built with ROCm v6.3, run (or add to your .bashrc):

export LD_LIBRARY_PATH=/collab/usr/global/tools/rccl/toss_4_x86_64_ib_cray/rocm-6.3.1/install/lib:$LD_LIBRARY_PATH

MPI4Py users

MPI4Py users are recommended to install one of our wheels provided here; those compatible with your python version will show with a git hash in `pip index versions --pre mpi4py` output.

For example,

pip install mpi4py==4.1.0.dev0+mpich.8.1.32

Using PyTorch from within a Jupyter notebook

Please use the docs Orbit and Jupyter notebooks to create a Jupyter kernel from your python virtual environment. In particular, after creating your virtual environment as described above, you will need to

  1. Install `ipykernel` to your virtual environment
  2. Install your custom kernel to `~/.local`
  3. Manually update LD_LIBRARY_PATH in `kernel.json`.
pip install ipykernel
python3 -m ipykernel install --prefix=$HOME/.local --name 'mytorchenv' --display-name 'mytorchenv'
echo $LD_LIBRARY_PATH

Use the output of `echo $LD_LIBRARY_PATH` to update `$HOME/.local/share/jupyter/kernels/<yourKernelName>/kernel.json` as shown in the "Custom Kernel ENV" section of Orbit and Jupyter notebooks. Your definition for "env" in kernel.json might look like this:

 "env": {
  "LD_LIBRARY_PATH": "/collab/usr/global/tools/rccl/toss_4_x86_64_ib_cray/rocm-6.3.1/install/lib:/opt/cray/pe/lib64:/opt/cray/lib64:/opt/cray/pe/papi/7.2.0.2/lib64:/opt/cray/libfabric/2.1/lib64:${LD_LIBRARY_PATH}" 
 },