Project
Converting CUDA programs to run on AMD GPUs
How I hipified CUDA workloads so the same code runs on Nvidia and AMD GPUs, without losing performance.
RoleProject
Date23-04-2024
Tech
HIPROCmCUDALinuxHPCSlurm
Highlights
- Hipified CUDA code and dependent libraries for AMD/Nvidia portability
- Validated numerics against CUDA baselines using high-precision checks
- Achieved comparable runtimes across vendors after tuning
What I set out to do
Make existing CUDA-heavy code (including third-party libraries) run on both Nvidia and AMD GPUs using HIP, without forcing scientists to rewrite kernels from scratch.
Why it mattered
- Supercomputers like LUMI use AMD GPUs; many research codes are CUDA-first.
- Teams need portability without sacrificing correctness or speed.
- External CUDA libraries had to come along for the ride.
Approach
- Hipify CUDA sources and dependent libraries using AMD's HIP toolchain.
- Align compiler flags and build tooling so one codebase targets both vendors.
- Validate numerics with high precision (quadruple where needed) to match CUDA baselines.
Case study
- Workload: Quasi-Minimal Residual (QMR) solver.
- Data: DREAM disruption/runaway electron analysis set.
- Check: Quadruple precision to ensure convergence parity.
Results
- Comparable runtimes on AMD and Nvidia GPUs after hipification.
- Minimal code churn: kernels, build flags, and libraries adapted without a full rewrite.
Takeaways
- HIP is a practical path to cross-vendor GPU support for legacy CUDA codes.
- Performance parity is achievable with careful memory access and build tuning.
- Portability also means validating dependent libraries, not just your own kernels.
Read the full thesis
Full text and figures: Converting CUDA programs to run on AMD GPUs.