On systems with AMD APUs or GPUs, including MI300A (El Cap, Tuo, RZAdams) and MI250X (Tenaya, Tioga), PyTorch wheels are supported and provided by LC. These wheel-based installs are the recommended path for most users.

Because only a limited set of Python and PyTorch version combinations are provided, some users may need to build PyTorch from source instead. Note that on AMD accelerators, PyTorch >= 2.7 must be used. For those cases, the wheel build script used by this project is an excellent reference for the expected toolchain, dependencies, and integration points:

Build reference: .gitlab-ci.yml for PyTorch containers (LC GitLab login required)

Source build guidance

Users building PyTorch from source should select the upstream repository based on their target:

Build target Recommended repository
Specific release version https://github.com/rocm/pytorch
Latest development or nightly build https://github.com/pytorch/pytorch

In general, users who need a stable, versioned, ROCm-aligned release should start from the ROCm PyTorch repository, using the branch to select your desired release (e.g. release/2.11). Users who want the newest development version should use the main PyTorch repository using the develop branch.

Important notes for users

Users building from source should expect to do more than just compile PyTorch itself. A usable distributed PyTorch environment on CORAL2 generally also requires:

  • a matching ROCm software stack
  • a supported compiler environment
  • a compatible MPI stack
  • mpi4py built against that MPI installation (optional)
  • the RCCL OFI plugin, typically aws-ofi-rccl, built against the appropriate ROCm and libfabric libraries
  • any additional math, Python, or build dependencies required by the selected PyTorch source tree
  • runtime library path setup as needed for the local environment