NUMAlink Implementation
Used by MPI_Barrier, MPI_Win_fence, and shmem_barrier_all
Fetch-Op-variables on Hub provide fast synchronization for flat and tree barrier methods
The Fetch-Op AMO helped reduce MPI send/recv latency from 12 to 8 usec
CPU
HUB
ROUTER
CPU
Fetch-op
variable
Previous slide
Next slide
Back to first slide
View graphic version