Case Study: Maxwell Code
- For scalar optimization, NX,NY,NZ have been adjusted to uneven values and arrays H,E have been padded to avoid cache trashing
- Now parallelization can be attempted to further increase performance on a parallel machine
- automatic parallelization with the -apo compiler option
- manual parallelization approach
-
- Study the performance implications of the different parallelization approaches
REAL EX(NX,NY,NZ),EY(NX,NY,NZ),EZ(NX,NY,NZ) !Electric field
REAL HX(NX,NY,NZ),HY(NX,NY,NZ),HZ(NX,NY,NZ) !Magnetic field
HX(I,J,K)=HX(I,J,K)-(EZ(I,J,K)-EZ(I,J-1,K))*CHDY
+(EY(I,J,K)-EY(I,J,K-1))*CHDZ
HY(I,J,K)=HY(I,J,K)-(EX(I,J,K)-EX(I,J,K-1))*CHDZ
+(EZ(I,J,K)-EZ(I-1,J,K))*CHDX
HZ(I,J,K)=HZ(I,J,K)-(EY(I,J,K)-EY(I-1,J,K))*CHDX
+(EX(I,J,K)-EX(I,J-1,K))*CHDY
here NX=NY=NZ = 32, 64, 128, 256 (i.e. with real*4 elements: 0.8MB, 6.3MB, 50MB, 403MB)