Vector Update: 2D Form
With DAXPY operation in the inner loop, we should consider further optimization with outer loop unrolling and blocking.
- hand tuning was necessary
- compiler would not implement loop interchange because in the original code the loops are not properly nested
- With the DAXPY formulation, we can consider 2-dimensional implementation of that code:
real*8 dp(ni,nj), p(ni,nib)