Why Distribute?
CC-NUMA architecture optimization. If:
all data initialized on one node
memory bottle neck
all shared data on one node
serialization of the CC
data used from different node
large interconnect traffic
average net latency on SGI3800:
16 proc
memory
Processor(s)
~(1/2 log2(p/16)+1)*50 ns
Previous slide
Next slide
Back to first slide
View graphic version