Blocking Example: Transpose
- either A or B is traversed in non-1 stride: bad spatial reuse of data
-
-
-
-
-
-
- blocking the loops for cache will do the transpose block per block, reusing the block elements:
for(j=tj; j<min(tj+s-1,n); j++)
for(i=ti; i<min(ti+s-1,n); i++)