Problems of Scalar Optimization
- each C(I,j) value is accumulated in the register for A(I,k)*B(k,j)
- B is traversed in sequence of cache lines (spatial locality)
- A is accessing only 1 word from each cache line (no locality)
- for A and B no reuse of cache lines (if n is large)
This is a problem only if A,B,C do not fit into the cache
C(i,j)=C(i,j) + A(i,k)*B(k,j)