Texas A&M Supercomputing Facility Texas A&M University Texas A&M Supercomputing Facility

Compiling and Running CUDA Programs

Last modified: Wednesday April 10, 2013 10:34 AM

Running CUDA Programs

Running CUDA Programs Interactively

Running CUDA programs interactively is for code development and debugging only.

CUDA programs (exept CUDA Fortran compiled with pgfortran, which has an emulation mode) must be executed on nodes with GPUs. This means you cannot run a CUDA progam on a login node which doesn't have a GPU. When you develop and debug a CUDA program, you want to test it right away, with the standard input, outpu, and error streams connected to the terminal session where your program is launched. Normal batch jobs sumbitted through a batch script don't provide such support. However, an interactive batch job will work. A batch job submited with "qsub -I" will be queued and scheduled as any PBS batch job, but when executed, the standard input,output, and error streams of the job are connected through qsub to the terminal session in which qsub is running. The result is as if you have logged into a remote compute node. Once you are shown with the remote shell, you can type in any commands for executing your CUDA program.

A special queue on Eos — the gpu queue — is reserved for CUDA programs. Interactive batch jobs, as well as normal batch jobs, must be submitted to the gpu queue in order to run on one or more GPU nodes.

An example of submitting an interactive batch job to the GPU queue is given below:

    qsub -I -q gpu -l nodes=1:ppn=1,mem=2g,walltime=1:00:00

Running CUDA Programs in Batch

The following examples show how to run CUDA programs in batch under different circumstances.

Example 1: Simple CUDA

 #PBS -S /bin/bash
 #PBS -l nodes=1:ppn=1,walltime=02:00:00,mem=12gb
 #PBS -q gpu
 #PBS -N test
 #PBS -j oe

 cd $PBS_O_WORKDIR
 ./gpu_prog.exe

Example 2: CUDA with OpenMP

 #PBS -S /bin/bash
 #PBS -l nodes=1:ppn=2,walltime=02:00:00,mem=12gb
 #PBS -q gpu
 #PBS -N test
 #PBS -j oe

 export OMP_NUM_THREADS=2
 cd $PBS_O_WORKDIR
 ./gpu_prog.exe

Example 3: CUDA with MPI

 #PBS -S /bin/bash
 #PBS -l nodes=2:ppn=2:gpus_2,walltime=02:00:00,mem=44gb
 #PBS -q gpu
 #PBS -N test
 #PBS -j oe

 module load opempi/1.6.0/pgi
 cd $PBS_O_WORKDIR
 mpirun ./gpu_prog.exe
More examples about epecifying different GPU resources are given below:
 #PBS -l nodes=2:ppn=1:gpus_1,walltime=02:00:00,mem=44gb
 #PBS -l nodes=2:ppn=1:gpus_1+2:ppn=2:gpus_2,walltime=02:00:00,mem=88gb