Converting CUDA programs to run on AMD GPUs

Highlights

Developed a general method to hipify both project code and external CUDA libraries
Validated a QMR solver with DREAM data where quadruple precision was required for convergence
Measured comparable runtimes on AMD and Nvidia GPUs in the test case

Overview

This thesis is about a practical portability problem: a lot of scientific GPU code is written for CUDA, but new systems are not always Nvidia-based. I focused on converting existing CUDA programs, including external CUDA libraries, into HIP so the same codebase could run on both AMD and Nvidia hardware.

The motivation was concrete. Nvidia has historically dominated GPU computing, but systems like LUMI in Finland use AMD GPUs in their GPU partition. That makes portability a real requirement for research teams that want to keep using existing CUDA code on newer hardware.

What I did

I describe a general conversion process where both the project's own CUDA sources and external CUDA library code are hipified. In practice, this means converting CUDA APIs to HIP equivalents, updating build scripts and Makefiles to use hipcc, and then compiling and testing the full stack together. One important conclusion from the implementation is that this is not a one-click migration: some manual CUDA-to-HIP porting is still needed depending on how each project and dependency is written.

Test case

I evaluated the method with a Quasi-Minimal Residual (QMR) solver using data from DREAM (Disruption and Runaway Electron Analysis Model), which models runaway electrons in tokamak fusion scenarios. In this specific QMR problem, quadruple precision arithmetic was required to get convergence at the target tolerance; with one million variables, the solver did not converge at $10^{-22}$ without it.

Results and conclusions

The implementation showed that CUDA programs with external CUDA-library dependencies can be converted and compiled to run on either vendor's GPUs through the outlined process. For the QMR case, runtimes on consumer AMD GPUs were reported as comparable to those on a professional Nvidia Tesla V100-16GB. The broader takeaway is that HIP is a viable path for cross-vendor GPU execution, but reliable results depend on careful build-system work, validation of dependencies, and project-specific adjustments.

Full thesis

The full text and figures are available via the project link below.

Full thesis