Texas A&M Supercomputing Facility Texas A&M University Texas A&M Supercomputing Facility

Compiling and Running OpenMP Programs

Last modified: Tuesday September 06, 2011 9:46 AM

In this guide we mostly show how to compile and run in parallel code that has been written and designed to use OpenMP's Application Programming Interface (API). The use of this API involves the insertion--by the user--of in-code compiler directives that can generate parallelized/multi-threaded code. Under this API, a program begins and continues to execute sequentially as a single process, (the master thread of execution), except when, intermittently, it encounters parallelised regions of code. Such regions of code are executed in parallel by a team of threads, created by the master thread and mapped to physical cpus by the operating system. This model of "multi-threaded" execution is operational only on shared memory processors/machines (SMP's), and, as such, the extent of parallelization is limited by the number of cores available in them. The SMP's in our case are either the 8-core Nehalem nodes or the 12-core Westmere nodes that constitute Eos.

Compiling OpenMP Code

The commands icc, icpc, and ifort invoke, respectively, the Intel C, C++, and Fortran compilers. All are capable of preprocessing, compiling, assembling, and linking code written for multi-threaded/parallel execution.

Prior to using any Intel compiler, you must "load" the appropriate module:

   module load intel/compilers

The command line form for invoking a compiler is:

   icc       -openmp [-parallel] [-par-report2] [C  options] -o omp_prog.exe file1 file2 ...
   icpc      -openmp [-parallel] [-par-report2] [C++ options] -o omp_prog.exe file1 file2 ...
   ifort     -openmp [-parallel] [-par-report2] [Fortran options] -o omp_prog.exe  file1, file2 ... 
where file1, file2, ... are any appropriate source, assembly, object, object library, or other (linkable) files that are linked to generate the multi-threaded executable file, omp_prog.exe. Appropriate file extensions are assumed for the specified input files, file1, ... We refer you to the section on Compiling and Running Serial Code on this matter. All generated executables are, by default, 64-bit.

All routines that call OpemMP library functions must also include the following header files:

   include                 C/C++
   USE OMP_LIB             Fortran 90 module

Running OpenMP Code

Before execution, certain environemnt variables should be set. Typical ones are OMP_NUM_THREADS, OMP_SCHEDULE, OMP_DYNAMIC, OMP_NESTED, and, for sure, if you're running a significant code, KMP_AFFINITY. To specify for example, the number of parallel threads of execution to deploy in a parallel region, set the OMP_NUM_THREADS to an appropriate number. See further below explanation for other such variables.
  export OMP_NUM_THREADS=nthreads; export OMP_SCHEDULE=value; ...
  omp_prog.exe

Example 1

program test_omp             ! Fortran 90
use omp_lib                  ! *** important
integer nthreads, threadid, max_threads, nprocs
!
max_threads = omp_get_max_threads(); nprocs = omp_get_num_procs()
print "(A, I2, A, I2, A)", &
'*** SERIAL REGION: max_threads = ',max_threads,' Current node has ',nprocs,' cores ***'

!$OMP PARALLEL SHARED(nprocs) PRIVATE(nthreads, threadid)

      threadid = omp_get_thread_num(); nthreads = omp_get_num_threads()
      print "(A, I2, A, I2, A, I2)", &
      '-- PARALLEL REGION: threadid = ',threadid,' nthreads = ',nthreads,' nprocs = ',nprocs

!$OMP END PARALLEL
end program test2_omp

export OMP_NUM_THREADS=4; export OMP_DYNAMIC=FALSE; export OMP_NESTED=FALSE

omp_test2.exe
*** SERIAL REGION: max_threads =  4 Current node has  8 cores ***
-- PARALLEL REGION: threadid =  0 nthreads =  4 nprocs =  8
-- PARALLEL REGION: threadid =  1 nthreads =  4 nprocs =  8
-- PARALLEL REGION: threadid =  3 nthreads =  4 nprocs =  8
-- PARALLEL REGION: threadid =  2 nthreads =  4 nprocs =  8

Example 2

pi.c
#include 
/** Eos:   icc -openmp -o pi3.exe pi3.c */

static long num_steps = 100000;        double step;      

#define NUM_THREADS 2                                 

int  main ()
{   int i, istat; double x, pi, sum = 0.0;
    step = 1.0/(double) num_steps;

    omp_set_num_threads(NUM_THREADS);

#pragma omp parallel for reduction(+:sum) private(x) firstprivate(step) \
                                                        lastprivate(step)
          for (i=1; i<= num_steps; i++) {
              x = (i-0.5)*step;
              sum = sum + 4.0/(1.0+x*x);                    
           }

           pi = sum * step;

           printf("***** PI_Value = %g *****\n", pi);
} /** End of pi.c  **/

OpenMP Environment Variables

The Intel Compiler supports OpenMP environment variables (with the OMP_ prefix) and extensions in the form of Intel-specific environment variables (with the KMP_ prefix). Setting these variables to particular values allows some control on the run-time behavior of the OMP binary. For reference, below we describe commonly used OMP environment variables.

Environment Variable Default Value Description
OMP_NUM_THREADS 1 Sets the maximum number of threads to use for OpenMP parallel regions if no other value is specified in the application. This environment variable applies to both -openmp and -parallel. The default value is set via the system environment scripts. Set a different value to override the default.

Example syntax: export OMP_NUM_THREADS=value

OMP_SCHEDULE STATIC Sets the run-time schedule type and an optional chunk size.

Example syntax: export OMP_SCHEDULE="kind[,chunk_size]"

OMP_DYNAMIC FALSE Enables dynamic adjustment of the number of threads when set to TRUE. Because of significant system overhead, use it with care.
OMP_NESTED FALSE Enables nested parallelism, the creation of a parallel region within a parallel region. Because of significant system overhead, use with care.
OMP_STACKSIZE 4M Sets the number of bytes to allocate for each OpenMP thread to use as the private stack for the thread. Recommended size is 16M.

Use the optional suffixes: B (bytes), K (Kilobytes), M (Megabytes), G (Gigabytes), or T (Terabytes) to specify the units. If only the value is specified and B, K, M, G, or T is not specified, then size is assumed to be K (Kilobytes).

This variable does not affect the native operating system threads created by the user program nor the thread executing the sequential part of an OpenMP program or parallel programs created using -parallel.

kmp_{set,get}_stacksize_s() routines set/retrieve the value. kmp_set_stacksize_s() routine must be called from sequential part, before first parallel region is created. Otherwise, calling kmp_set_stacksize_s() has no effect.

Related env variables: KMP_STACKSIZE. KMP_STACKSIZE overrides OMP_STACKSIZE. But see also the -heap-arrays compiler option.

Example syntax: export OMP_STACKSIZE=value

KMP_AFFINITY OS Scheduling Maps OpenMP threads to sockets and cores on a node. The selection of an appropriate setting depends on code behaviour. It can have a significant impact on code performance. Some experimentation may be required before a successful selection is made. We only cite and explain the two most common settings and refer you to the documentation for the extra detail.

export=KMP_AFFINITY="verbose,scatter" Assigns consecutive (0, 1, ...) threads on alternating sockets and displays (verbose) on std out the specific mapping. Works best when there is minimal sharing of data between threads.

export=KMP_AFFINITY="verbose,compact,1" Assigns consecutive (0, 1, ...) threads on different physical cores on the same socket and displays (verbose) on std out the specific mapping. Works best when there is significant sharing of data between threads.

References

The man page of each compiler (e.g., man ifort) has almost everything you will need. Fuller documentation c an be found on our website at the Intel Software Documentation page. The text book, Using OpenMP (The MIT Press), by Chapman, Jost, Van Der Pas is highly recommended. A good Web reference to start is, www.nic.uoregon.edu/iwomp2005/iwomp2005_tutorial_openmp_rvdp.pdf.