Compiling and Running CUDA Programs

Last modified: Wednesday April 10, 2013 10:39 AM

A Brief Introduction to CUDA

Four Eos compute nodes are equipped with NVIDIA Graphic Processing Unit (GPU) cards: two nodes with two Tesla M2050s each and the other two nodes with one Tesla M2070 each. The four nodes support GPU accelerated computing using CUDA, a general purpose parallel platform and programming model for offloading computational intensive parts of a program to NVIDIA GPUs. Figure 1 shows a block diagram of a GPU node with two M2050s.
Eos GPU Node

Figure 1

The CUDA programming model is a heterogeneous model in which both the CPU and GPU are used. In CUDA, the host refers to the CPU and its memory, while the device refers to the GPU and its memory. Code running on the CPU manages memory on both host and device, controls data transfers between the host memory and the device memory, and launches kernels, which are functions running on the devices. Given the heterogeneous nature of the CUDA programming model, a typical sequence of operations for a CUDA program is:

  1. Declare and allocate host and device memory.
  2. Initialize host data.
  3. Transfer data from the host memory to the device memory.
  4. Execute one or more kernels.
  5. Transfer results from the device memory to the host memory.

On Eos, users can develop CUDA-enabled applications through one of three methods:

  1. Programming NVIDIA GPUs using the NVIDIA CUDA C/C++ programming interface.
  2. Programming NVIDIA GPUs using the PGI CUDA Fortran.
  3. Automatically parallelize loops in C/C++/Fortran code using OpenACC directives.
In the following sections, we will show how to compile codes developed under each and every method mentioned above. For details on how to program with CUDA C/C++, CUDA Fortran, and OpenACC, please refer to references.