- It takes ~50 ms to enter parallel region with 64 proc
- with 800 Mflop/s per processor, it can do 40K flop in that time.
- Parallel loop must contain ɮ.5Mflop to justify parallel run
- It takes ~500 ms to do reduction with 64 proc
- OpenMP performance depends on architecture, not on processor speed
- compare Origin2800 300MHz, 400MHz and Origin3800 400MHz
- Application speed on parallel machine is determined by the architecture