Scalability & Data Distribution
The Scalability of parallel programs might be affected by:
- Amdahl law for the parallel part of the code
- plot elapsed run times as function of number of processors
- Load balancing and work distribution to the processors
- profiling the code will reveal significant run times for libmp library routines
- compare perfex for event counter 21 (fp operations) for different threads
- False sharing of the data
- perfex high value of cache invalidation events (event counter 31: store/prefetch exclusive to shared block in scache)
- Excessive synchronization costs
- perfex high value for store conditionals (event counter 4)
When cache misses account for only a small percentage of the run time, the program makes good use of the memory and its performance will not be affected by data placement
- diagnose significant memory load with perfex statistics: high value of memory traffic
Program scalability will be improved with data placement if there is:
- memory contention on one node
- excessive cache traffic