Data Distribution
On an SMP machine all data is visible from all processors. Special optimization applies to Origin to exploit multiple paths to memory:
- By default, all pages are allocated with a “first touch” policy
-
- The initialization loop, if executed serially will grab pages from single node
-
- In the parallel loop multiple processors will access that one memory
-
- The program will not scale on the machine if the memory path is not optimized. Performance on the STREAM benchmark:
- ~400 MB/s from single node independent of # of nodes
- ~200*N MB/s from N nodes, I.e. scales with the number of nodes if the data is distributed correctly