Skip to main content

Project

Converting CUDA programs to run on AMD GPUs

How I hipified CUDA workloads so the same code runs on Nvidia and AMD GPUs, without losing performance.

RoleProject
Date23-04-2024
Tech
HIPROCmCUDALinuxHPCSlurm

Highlights

  • Hipified CUDA code and dependent libraries for AMD/Nvidia portability
  • Validated numerics against CUDA baselines using high-precision checks
  • Achieved comparable runtimes across vendors after tuning

What I set out to do

Make existing CUDA-heavy code (including third-party libraries) run on both Nvidia and AMD GPUs using HIP, without forcing scientists to rewrite kernels from scratch.

Why it mattered

  • Supercomputers like LUMI use AMD GPUs; many research codes are CUDA-first.
  • Teams need portability without sacrificing correctness or speed.
  • External CUDA libraries had to come along for the ride.

Approach

  1. Hipify CUDA sources and dependent libraries using AMD's HIP toolchain.
  2. Align compiler flags and build tooling so one codebase targets both vendors.
  3. Validate numerics with high precision (quadruple where needed) to match CUDA baselines.

Case study

  • Workload: Quasi-Minimal Residual (QMR) solver.
  • Data: DREAM disruption/runaway electron analysis set.
  • Check: Quadruple precision to ensure convergence parity.

Results

  • Comparable runtimes on AMD and Nvidia GPUs after hipification.
  • Minimal code churn: kernels, build flags, and libraries adapted without a full rewrite.

Takeaways

  • HIP is a practical path to cross-vendor GPU support for legacy CUDA codes.
  • Performance parity is achievable with careful memory access and build tuning.
  • Portability also means validating dependent libraries, not just your own kernels.

Read the full thesis

Full text and figures: Converting CUDA programs to run on AMD GPUs.