MPI Stack Update on EOS

Dear EOS users,

we have just finished installing the latest MPI stacks based on the revamped Intel 12.1 s/w suite recently installed on EOS. These latest MPI versions bring along performance improvements, increased capabilities and closer compliance to the MPI 2.2 standards.

Specifically, we have installed the latest versions of the following MPI stacks

These MPI stacks have been build against the Intel 12.1 s/w suite and cannot be used with the older Intel 11.1 collection of software. In general, once we build MPI code with a particular MPI stack, we can only execute this code when the same stack is activated. Also, code compiled with an older version of the same MPI stack may not execute with the newer MPI version (and vice-versa).

To activate the MPI stack of your choice, use one of the following modules:

Use module modname help to receive some initial help using this particular module. Some of the MPI stacks may also provide man pages for the corresponding MPI compilation and execution commands.

All MPI stacks have some general similarities in compiling or running MPI code but they also differ in many aspects. From the way parameters are set and communicated, to the way they map MPI ranks to processor cores or nodes. Users, and particularly developers of scalable MPI code, must attain at least a familiarity with these differences among the stacks.

One difference among the stacks is the specific command lines which have to be used to compile or run MPI code according to each stack. Specifically, commands for the compilers of each stack are:

Each of the above commands compiles properly MPI code into an MPI binary by invoking the appropriate Intel 12.1 compiler. Please note that the Intel MPI compilers, mpicc, mpicxx and mpif77/mpif90 invoke the GCC compiler in the back end and not the Intel one.

NOTE: MPI libraries may chose to "bind" processes to processor, or other resource sets, by default or not to. The "binding of a process to a processor set" means that this process (representing an MPI rank) may only execute on these particular processors. When a node is not "oversubscribed", that is, as long as the number of compute processes it has to execute is less than the number of free cores, binding each rank to a distinct core usually provides some performance benefits. However, binding done improperly on under-subscribed nodes may force more than one compute process execute on the same core. This will definitely decrease performance which at times may be severe.

If a node ends up having to execute more active compute processes than free cores (i.e., when it is "over-subscribed"), binding processes to specific cores will likely decrease performance even more.

The problem of improper binding is exacerbated when the MPI code is hybrid, i.e., when each rank actually is multi-threaded, such as, a OpenMP, MKL or just plain POSIX threads piece of code.

Our advise is that in case you doubt do not enable rank to core binding.

The default binding behavior differs with MPI stack. Specifically,

The discussion of mapping or binding ranks and threads to nodes or cores is very involved and tricky at times and it will be covered in a regular write up in detail.

Each MPI stack has its own advantages and disadvantages in terms of performance, programmability and tight (or lack of) integration with the underlying batch scheduler. Please inquire with the SC staff for details and assistance tuning your MPI code.

Posted on: 12:10 AM, July 27, 2012