CC-NUMA Architecture: Programming
- All data is shared
- Additional optimization to place data close to the processor that would do most of the computations on that data
- Automatic (compiler) optimizations for single processor and parallel performance
- The data access (data exchange) is implicit in the algorithm;
- Except for the additional data placement directives, the source is the same as for the single processor programming (SMP principle)
C every processor holds a column of each matrix:
C$distribute A(*,block),B(*,block),C(*,block)
C(i,j)=C(i,j) + A(i,k)*B(k,j)