Maxwell Code Example
-mips4 -O3 -LNO:opt=0 -OPT:reorg_common=off
(to show the effect of compiler not performing the necessary optimizations)
gives performance on this code of 4.6 Mflop/s
REAL EX(NX,NY,NZ),EY(NX,NY,NZ),EZ(NX,NY,NZ) !Electric field
REAL HX(NX,NY,NZ),HY(NX,NY,NZ),HZ(NX,NY,NZ) !Magnetic field
HX(I,J,K)=HX(I,J,K)-(EZ(I,J,K)-EZ(I,J-1,K))*CHDY
+(EY(I,J,K)-EY(I,J,K-1))*CHDZ
HY(I,J,K)=HY(I,J,K)-(EX(I,J,K)-EX(I,J,K-1))*CHDZ
+(EZ(I,J,K)-EZ(I-1,J,K))*CHDX
HZ(I,J,K)=HZ(I,J,K)-(EY(I,J,K)-EY(I-1,J,K))*CHDX
+(EX(I,J,K)-EX(I,J-1,K))*CHDY
here NX=NY=NZ = 32, 64, 128, 256 (i.e. with real*4 elements: 0.8MB, 6.3MB, 50MB, 403MB)
Reusing load from previous iteration (I-1) gives in total:
13 memory operations (6H+7E) -> minimum 13 cycles/iteration
18 floating point operations in this code
18/(13*2)=69% peak, i.e. 800Mflop/s on the R10000@400MHz processor