Compiling & Running Programs

Last update on Friday, 01-Feb-2008 16:02:57 CST.

Using the Intel Compilers

By default, both the Intel as well as the GNU compilers are available to users without the need for any special configuration. The Intel compilers can be invoked from the command line as follows:

Intel 7.1 Compilers

ecc [compiler options] -o myprog.exe myprog.c Intel 7.1 C compiler
ecpc [compiler options] -o myprog.exe myprog.c Intel 7.1 C++ compiler
efc [compiler options] -o myprog.exe myprog.f Intel 7.1 Fortran compiler

Intel 8.x and 9.x Compilers

icc [compiler options] -o myprog.exe myprog.c Intel 8.x/9.x C compiler
icpc [compiler options] -o myprog.exe myprog.c Intel 8.x/9.x C++ compiler
ifort [compiler options] -o myprog.exe myprog.f Intel 8.x/9.x Fortran compiler

Several versions and builds of the Intel compilers are typically installed on cosmos at any given time. While a single build of any given compiler is set as the default by the system, the user may set his/her own default version/build for developmental work. This can be done using the module command. For instance, the following commands remove access to the 20040310 build of the Intel 7.1 compilers and add access to the 20040416 build of the Intel 8.0 compilers:

module unload intel-7.1-20040310
module load intel-8.0-20040416

Common Compiler Options

Compiler OptionExplanation
-auto (Fortran) Places variables, except those declared as SAVE, on the run-time stack.
-O[n] Optimization flag, n=0-3. n=0, no optimization; n=1, enables some optimizations for speed; n=2 (or just -O) is the default. Enables among other, inlining and software pipelining; n=3, enables additional aggressive optimizations.
-parallel Enables automatic parallelization of loops (when possible).
-openmp Enables the parallelizer to generate multithreaded code based on user-inserted OpenMP directives. The -openmp option can be used in conjunction with the -O[0-3].
-opt_report Generates an optimization report on stderr.
-openmp_report Displays diagnostics for code areas of successfull parallelization.
-par_report Displays auto-paralellizer's diagnostics indicating loops successfully auto-parallelized. Issues a "LOOP AUTO-PARALLELIZED" message for parallel loops. See man page for more details.
-r8 (Fortran) The default size of real numbers is set to 8 bytes.
-r16 (Fortran) The default size of real numbers is set to 16 bytes.
-extend_source (Fortran) Specify 132 column lines for fixed form sources.
-[no]stack_temps Compiling with -nostack_temps instructs the compiler to allocate space in the heap for temporary arrays. -stack_temps tells the compiler to allocate space for temporary arrays on the runtime stack whenever possible. Default is -nostack_temps.
-tpp2 Target optimization to the Itanium2 processor.
-lname Links with the library called libname.so or libname.a
-Ldir Instructs the linker to include directory "dir" in the search path.
-assume buffered_io (Fortran) Tells the compiler to set the default for opening sequential output files to BUFFERED=‘YES’, so that writes to disk will be buffered. The default is -assume nobuffered_io, which means that data will be immediately written to disk. This option will typically improve I/O performance substantially.

Filename Extensions

ExtensionExplanation
.c C source code
.cc C++ source code
.f fixed form Fortran source code
.for fixed form Fortran source code
.ftn fixed form Fortran source code
.fpp Fortran fixed form source code, preprocessed by the Intel Fortran preprocessor fpp
.f90 Fortran 90/95 source code, compiled by the Intel Fortran compiler;  free-form source code
.F Fortran fixed form source code, will be passed to preprocessor (fpp) and then compiled by the Intel Fortran compiler
.s assembly source code
.o compiled object file

Serial Programs

C/C++ Compiler:   ecc

The Intel C++ Compiler for Linux is substantially source and object code compatible with GNU C compiler. This allows you to recompile your existing software with the Intel C++ Compiler as a simple way to add performance to your application. Alternatively, you can build applications by compiling specific objects with the Intel C++ compiler and link them with objects compiled with GNU C. This is especially useful if you want to start using the Intel compiler on a few objects first. Additionally, the Intel C++ Compiler is compliant with the C++ ABI standard, which enables stronger binary compatibility with gcc version 3.2. The Intel C++ Compiler is also substantially compatible with tools you probably already use in developing your Linux applications such as: make, emacs, and gdb.

For example, to compile the C files: myprog.c, sub1.c sub2.c

ecc -O2 -o myprog.exe myprog.c sub1.c sub2.c

where the -O2 is the optimization flag. If there are no errors, it will generate the binary executable "myprog.exe".

For details see the document Intel Compilers for Linux: Compatibility with GNU Compilers.

Fortran Compiler:   efc

The Intel Fortran Compiler for Linux is substantially compatible with widely used Linux application development tools such as make, emacs, and gdb. Compatibility is extended into the ability to handle big endian files. This is a mechanism to read and write data files in Big Endian mode.

In order to compile the Fortran files myprog.f sub1.f sub2.f, use the following command:

efc -O2 -o myprog.exe myprog.f sub1.f sub2.f

where the -O2 is the optimization flag. If there are no errors, it will generate the binary executable "myprog.exe".

Object Files & Shared Objects

At times, you may need to compile a source file without a main function into an object file. This is useful for separating different segments of code. To compile object files, invoke the compiler with these options:

ecc -c -fpic myfunc1.c myfunc2.c -o functions.o

Note that multiple .c source files or .o object files can be compiled and linked in a single command.

You may also want to write your own shared library for use with other programs. In this case, you will need to generate your shared object file (.a or .so) so that programs that you write later can link to this library in order to call these functions. To do this, invoke the compiler with the following arguments:

ecc -fpic -shared myfunc1.o myfunc2.o -o libmyfuncs.a

Your output file, libmyfuncs.a could also have been named libmyfuncs.so.

OpenMP Programs

To run your program in parallel in the shared-memory model you can use either or both of the options: -openmp and -parallel. In addition, the environment variable OMP_NUM_THREADS must be set to the number of threads/cpus that you want before you execute the program. In batch jobs the value you set OMP_NUM_THREADS to must equal that of the PBS resource parameter ncpus (e.g., -l ncpus=4).

For example, to compile and link OpenMP programs:

ecc -openmp -o myprog.exe myprog.c sub1.c sub2.c
efc -openmp -o myprog.exe myprog.f sub1.f sub2.f

ecc -openmp -parallel -o myprog.exe myprog.c sub1.c sub2.c
efc -openmp -parallel -o myprog.exe myprog.f sub1.f sub2.f

To run the OpenMP program compiled above:

export OMP_NUM_THREADS=2; ./myprog.exe

where 2 is the number of threads to create in this instance and "myprog.exe" is the name of the OpenMP executable program.

Note that a number of environment variables are important in affecting the performance of your OpenMP codes (the OMP_ variables are part of the OpenMP standard while the KMP_ variables are Intel specific variables):

Environment VariableExplanation
OMP_SCHEDULE Sets the run-time schedule type and chunk size. (static - default)
OMP_NUM_THREADS Sets the number of threads to use during execution (default is set to 1 on Altix by the system login scripts).
KMP_LIBRARY Selects the OpenMP runtime library throughput. The options for this variable are: serial, turnaround, or throughput indicating the execution mode. The default value of throughput is used if this variable is not set.
KMP_STACKSIZE Sets the number of bytes to allocate for each parallel thread to use as its private stack. Use the optional suffix b, k, m, g, or t, to specify bytes, kilobytes, megabytes, gigabytes, or terabytes. The default is 4m.

Note that the -stack_temps compiler option is helpful for threaded programs such as OpenMP programs, which repeatedly allocate heap memory. Sometimes, as the number of threads increases, this type of heap allocation degrades performance. Allocating arrays on the stack using -stack_temps can eliminate performance problems. Threaded programs using auto-parallelization or OpenMP may also need to increase the thread stack size by using the KMP_STACKSIZE environment variable in addition to the increase in program stack size mentioned above.

MPI Programs

MPT, the SGI Message Passing Toolkit, is an optimized set of the MPI and SHMEM programming libraries.  It is the preferred parallel library on the Altix.  The MPT version available on cosmos is:  sgi-mpt-1.10-sgi300r1.

To use mpi you must include the MPI library by using the -lmpi switch when linking your program and start your program using the mpirun command with the -np nproc switch to specify the number of processors.

For example, to compile MPI programs:

ecc -o myprog.exe myprog.c sub1.c sub2.c -lmpi
efc -o myprog.exe myprog.f sub1.f sub2.f -lmpi
ecc -o myprog.exe myprog.cc sub1.cc sub2.cc -lmpi++ -lmpi

To run one of the MPI programs compiled above:

mpirun -np 2 ./myprog.exe

where 2 is the number of processors to allocate and "myprog.exe" is the name of the MPI executable program.

Various environment variables can also be set to tune performance of MPI programs. Consult the man page for mpi (man mpi) for detailed information.

Note that due to CPU time or memory limits imposed on interactive processing on the Altix, you may not be able to interactively test MPI programs requiring a greater amount of such resources. In such cases you will need to submit your program for execution as a batch job.

Known Problems

MPI batch jobs are being killed

SGI's MPI requires a very high amount of virtual memory when using the default MPI settings. Your MPI program in your PBS job script may get killed for excessive virtual memory (vmem) use:

 =>> PBS: job killed: vmem 4219024080kb exceeded limit 536870912kb

There are several environment variables that you can use to lower the virtual memory requirements of your MPI program:

  • Disable memory mapping by using the MPI_MEMMAP_OFF environment variable.
  • Reduce the amount of heap and stack that is memory mapped per MPI process with the MPI_MAPPED_HEAP_SIZE and MPI_MAPPED_STACK_SIZE environment variables respectively.

The effects on the performance of your MPI program will likely vary per MPI application. See the mpi man page for more information about these environment variables.

Known Issues with Intel 7.1 Compilers

  • For the Intel 7.1 compilers only: If you are making certain system calls like sleep(), gethostname(), etime(), etc., or certain other function calls from a Fortran or C program, any problems with linking that occur are likely related to not having the libPEPCF90.a library correctly linked. If you are having problems, add the following -l argument to your compiler command:

    efc -o myprog.exe myprog.f -lPEPCF90

  • For the Intel 7.1 compilers only: If you are getting warnings such as the following:

    Warning 4 at (110:file.f90) : Tab characters are an extension to standard Fortran 95

    Use one of the Intel 8.x compilers instead. See the Modules page for information about using modules to control your environment.

GNU Compilers

The GNU compilers can be invoked from the command line as follows:

gcc [compiler options] -o myprog.exe myprog.c GNU C compiler
g77 [compiler options] -o myprog.exe myprog.f GNU Fortran compiler
g++ [compiler options] -o myprog.exe myprog.cc GNU C++ compiler

Being the standard C compiler on the Unix/Linux platform and the most commonly used for open source projects, the GNU C compiler (gcc/g++) has a reputation for being robust.

In general, the performance of the GNU C compiler is excellent, but it is still less optimal than Intel's compilers on the Altix. This is true partly because the Itanium2 platform is relatively new and the GNU compiler has not been optimized well for this platform. We definitely recommend the Intel compilers on the Altix instead of the GNU compilers.

Intel claims that its compiler is, in any case, GNU compatible. However, the Intel C++ compiler is not completely compatible with the GNU compilers.

GNU's Fortran compiler (g77) is not as good as its C compiler. We don't recommend it to users for compiling Fortran code on the Altix.

The syntax of the GNU compilers is very close to that of the Intel compilers. Please refer to the GNU compiler website for documentation. The currently installed version of the GNU compilers can be determined using the --version compiler option (gcc --version).

ProPack 3.0 Compatibility Issues

ProPack 3.0 contains support for the Native POSIX Thread Library (NPTL), a new implementation of POSIX threads and a new version of the GNU C Library (glibc-2.3.2-95). There may be some compatibility issues with these libraries.

Applications that use LinuxThreads

While the new NPTL is binary compatible with the old LinuxThreads implementations, applications which depend on behaviors in which the LinuxThreads implementation deviates from the POSIX standard will produce errors and will need to be fixed.

It is possible to force the application to use the old LinuxThreads implementation by setting the environment variable LD_ASSUME_KERNEL.

$ export LD_ASSUME_KERNEL=2.4.19

or (depending on your shell)

% setenv LD_ASSUME_KERNEL 2.4.19

Applications Unable to Resolve the errno Symbol

Applications that are unable to resolve the errno symbol because it was not explicitly #included, will see an error similar to the following:

./myprog
./myprog: relocation error: ./myprog: symbol errno, version GLIBC_2.2 not
defined in file libc.so.6.1 with link time reference

Applications can work around the problem by setting the LD_ASSUME_KERNEL environment variable as noted above.

Applications Using glibc Private Symbols

Applications that use glibc private symbols (e.g. _dl_loaded) should be recompiled at the earliest convenience. This problem has been noticed with applications compiled with the Intel 7.1 compiler build 20030814 or earlier. Applications compiled with Intel compilers with build dates after 20030814 have not exhibited the problem. The following is an example of type of error message reported by applications using glibc private symbols:

./myprog: relocation error: ./myprog: symbol _dl_loaded, version GLIBC_2.2 not
defined in file ld-linux-ia64.so.2 with link time reference

An unsupported workaround that may or may not work is to use the shared object, /usr/lib/sgi-compat-preload.so, which defines _dl_loaded as NULL. To use this workaround, set the LD_PRELOAD environment variable at runtime as follows:

$ export LD_PRELOAD=/usr/lib/sgi-compat-preload.so

or (depending on your shell)

% setenv LD_PRELOAD /usr/lib/sgi-compat-preload.so

Please note that this workaround is not endorsed by Intel and it may not work for all applications. If the application is open source, the best course of action is to recompile with a newer version of the Intel compilers. If it is a closed source application, contact your application provider and request that they provide new binaries compiled with more recent versions of the Intel compiler.

Statically Linked Applications

Binary compatibility for statically linked applications is not guaranteed. Whenever possible, applications should be linked dynamically. If an application must be linked statically, it should be re-linked with the glibc against which it will be run and the object files for re-linking the application must be made available as noted in the glibc license.