MPI Stack Update on EOS
Dear EOS users,
we have just finished installing the latest MPI stacks based on the revamped Intel 12.1 s/w suite recently installed on EOS. These latest MPI versions bring along performance improvements, increased capabilities and closer compliance to the MPI 2.2 standards.
Specifically, we have installed the latest versions of the following MPI stacks
- Intel MPI Library v4.0 (u3)
- mvapich2 1.8 (r5471 with limic2 support), and
- Open MPI 1.6.0.
These MPI stacks have been build against the Intel 12.1 s/w suite and cannot be used with the older Intel 11.1 collection of software. In general, once we build MPI code with a particular MPI stack, we can only execute this code when the same stack is activated. Also, code compiled with an older version of the same MPI stack may not execute with the newer MPI version (and vice-versa).
To activate the MPI stack of your choice, use one of the following modules:
- "module load intelXE/mpi" Intel MPI Library v4.0 (u3),
- "module load mvapich2/1.8/intelXE" mvapich2 1.8 (r5471 with limic2 support), and
- "module load openmpi/1.6.0/intelXE" Open MPI 1.6.0.
Use module modname help to receive some initial help using this particular module. Some of the MPI stacks may also provide man pages for the corresponding MPI compilation and execution commands.
All MPI stacks have some general similarities in compiling or running MPI code but they also differ in many aspects. From the way parameters are set and communicated, to the way they map MPI ranks to processor cores or nodes. Users, and particularly developers of scalable MPI code, must attain at least a familiarity with these differences among the stacks.
One difference among the stacks is the specific command lines which have to be used to compile or run MPI code according to each stack. Specifically, commands for the compilers of each stack are:
- Intel MPI 4.0 library: mpiicc, mpiicpc, mpifort are, respectively, the C, C++ and Fortran MPI compilers,
- mvapich2 1.8 MPI library: mpicc, mpicxx (mpic++), mpif70 and mpif90, respectively, the C, C++ and Fortran77/90 MPI compilers. Finally,
- Open MPI 1.6.0 library: mpicc, mpicxx (mpiCC, mpic++), mpif70 and mpif90, respectively, the C, C++ and Fortran77/90 MPI compilers.
Each of the above commands compiles properly MPI code into an MPI binary by invoking the appropriate Intel 12.1 compiler. Please note that the Intel MPI compilers, mpicc, mpicxx and mpif77/mpif90 invoke the GCC compiler in the back end and not the Intel one.
NOTE: MPI libraries may chose to "bind" processes to processor, or other resource sets, by default or not to. The "binding of a process to a processor set" means that this process (representing an MPI rank) may only execute on these particular processors. When a node is not "oversubscribed", that is, as long as the number of compute processes it has to execute is less than the number of free cores, binding each rank to a distinct core usually provides some performance benefits. However, binding done improperly on under-subscribed nodes may force more than one compute process execute on the same core. This will definitely decrease performance which at times may be severe.
If a node ends up having to execute more active compute processes than free cores (i.e., when it is "over-subscribed"), binding processes to specific cores will likely decrease performance even more.
The problem of improper binding is exacerbated when the MPI code is hybrid, i.e., when each rank actually is multi-threaded, such as, a OpenMP, MKL or just plain POSIX threads piece of code.
Our advise is that in case you doubt do not enable rank to core binding.
The default binding behavior differs with MPI stack. Specifically,
- Intel MPI 4.0 library
Intel MPI by default binds ("pins") processes to cores and most of the time the rank to core mapping is appropriate. The binding is also usually correct for hybrid, e.g., MPI ranks being OMP code, as the OMP library also binds OMP threads to available cores. Set environment variable I_MPI_PIN to "0" to disable binding by MPI.
- mvapich2 1.8 MPI library
mvapich2 by default binds ranks to cores. This usually improves performance in under-subscribed nodes. However, hybrid code, such as, MPI+OMP or MPI+MKL will definitely suffer severe performance degradation with the default binding and users are strongly advised to disable binding in this case, by setting MV2_ENABLE_AFFINITY to 0. Intel MKL and OMP when used with NON-intel MPI library will simply bind multiple or all threads onto the same core.
- Open MPI 1.6.0 library.
Open MPI 1.6.0 library does NOT by default bind MPI processes or threads in the hybrid case to any resource. This conservative approach usually avoids the at times sever performance degradation of improper binding. Note that OpenMPI supports a lengthy list of command line options allowing the user to direct the library to map ranks to nodes and bind them to cores along with their threads (if any). When thread to core binding is enabled, OMP or MKL hybrid MPI code may suffer the same performance degradation as with mvapich2 if done improperly.
Each MPI stack has its own advantages and disadvantages in terms of performance, programmability and tight (or lack of) integration with the underlying batch scheduler. Please inquire with the SC staff for details and assistance tuning your MPI code.
Posted on: 12:10 AM, July 27, 2012