Summary
- It is important to understand the semantics of MPI
-
- The send/receive calls provide for data synchronization, not necessarily process synchronization
-
- A correct MPI program cannot depend on buffering for messages
-
- For a highly optimized MPI program, it is important to use only few optimized subroutines from the MPI library, typically straight send/receive variants
-
- The SGI implementation of MPI uses N+1 processes in parallel region, therefore it is better for scalability to run MPI with smaller number of processors than physically available in the machine
-
- Proprietary Message Passing Libraries (I.e. SHMEM) perform better than MPI on the Origin, because MPI’s generic interface makes it much harder to optimize