MPI Tips for Performance
- Try direct-copy send/receive for memory bandwidth improvement and collective calls
- Use one-sided communication for latency (& memory bandwidth) improvement
- Try setting MPI_DSM_MUSTRUN or SMA_DSM_MUSTRUN to maintain CPU / memory affinity
- Do NOT use bsend/ssend or wild cards (MPI_ANY_SOURCE, MPI_ANY_TAG) for message headers