Vector Architecture
- Vectors will be loaded (loadv instruction) from memory
- The performance is determined by memory bandwidth
- Optimization takes vector length (64 words) into account
loadf f2,(r3) load scalar A(i,k)
loadv v3,(r3) load vector B(k,1:n)
mpyvs v3,v3,v2 calculate A(I,k)*B(k,1:n)
addvv v4,v4,v3 update C(I,1:n)
C(i,1:n)=C(i,1:n) + A(i,k)*B(k,1:n)