MPI on Rohan 10Nov03
© O'Reilly Publishers (Used with permission)
PROGRAM HEATROD PARAMETER (MAXTIME=200) INTEGER TICKS, I, MAXTIME REAL*4 ROD(10) ROD(1) = 100.0 DO I=2,9 ROD(I) = 0.0 ENDDO ROD(10)=0.0 DO TICKS=1,MAXTIME IF (MOD(TICKS,20) .EQ. 1) PRINT 100, TICKS, (ROD(I), I=1,10) DO I=2,9 ROD(I) = (ROD(I-1) + ROD(I+1))/2 ENDDO ENDDO 100 FORMAT (I4,10F7.2) END
We compile this code on rohan f90 p274rod.f90 and execute to obtain the output:
1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21 100.00 87.04 74.52 62.54 51.15 40.30 29.91 19.83 9.92 0.00 41 100.00 88.74 77.51 66.32 55.19 44.10 33.05 22.02 11.01 0.00 61 100.00 88.88 77.76 66.64 55.53 44.42 33.31 22.21 11.10 0.00 81 100.00 88.89 77.78 66.66 55.55 44.44 33.33 22.22 11.11 0.00 101 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00 121 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00 141 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00 161 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00 181 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00We have formatted the output for only two digits past the decimal and clearly see steady state at step 101.
We obtain a listing and ask for some parallel optimization with the
command f90 -Xlist -parallel p274rod.f90 and see the main loop
was not vectorized
p274rod.lst.
PROGRAM HEATRED PARAMETER (MAXTIME=200) INTEGER TICKS, I, MAXTIME REAL*4 RED(10), BLACK(10) RED(1) = 100.0 BLACK(1) = 100.0 DO I=2,9 RED(I) = 0.0 ENDDO RED(10) = 0.0 BLACK(10) = 0.0 DO TICKS=1,MAXTIME,2 IF (MOD(TICKS,20) .EQ. 1) PRINT 100, TICKS, (RED(I), I=1,10) DO I=2,9 BLACK(I) = (RED(I-1) + RED(I+1))/2 ENDDO DO I=2,9 RED(I) = (BLACK(I-1) + BLACK(I+1))/2 ENDDO ENDDO 100 FORMAT (I4,10F7.2) END
rohan.stewart:~/cs575/spr01/diffusion> f90 -Xlist -parallel p275red.f90 f90: Warning: Optimizer level changed from 0 to 3 to support parallelized code rohan.stewart:~/cs575/spr01/diffusion> a.out 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21 100.00 82.38 66.34 50.30 38.18 26.06 18.20 10.35 5.18 0.00 41 100.00 87.04 74.52 61.99 50.56 39.13 28.94 18.75 9.38 0.00 61 100.00 88.36 76.84 65.32 54.12 42.91 32.07 21.22 10.61 0.00 81 100.00 88.74 77.51 66.28 55.14 44.00 32.97 21.93 10.97 0.00 101 100.00 88.84 77.70 66.55 55.44 44.32 33.23 22.14 11.07 0.00 121 100.00 88.88 77.76 66.63 55.52 44.41 33.30 22.20 11.10 0.00 141 100.00 88.89 77.77 66.66 55.55 44.43 33.32 22.22 11.11 0.00 161 100.00 88.89 77.78 66.66 55.55 44.44 33.33 22.22 11.11 0.00 181 100.00 88.89 77.78 66.67 55.55 44.44 33.33 22.22 11.11 0.00
Now the convergence for 2 digits of steady-state occurs at step 181, somewhat
slower, but we have eliminated the dependency in the loop and it vectorizes
and therefore parallelizes, as we see in the listing
p27red.lst.
The time to compile and run these examples are all contained in the
following script file
Script for Lecture
We want to think in SIMD mode, focussing more on the data and less on the control to see how to best use Fortran 90.
Fig. 13-4: Data alignment and computations
© O'Reilly Publishers (Used with permission)
PROGRAM HEATROD PARAMETER (MAXTIME=200) INTEGER TICKS, I, MAXTIME REAL*4 ROD(10) ROD(1) = 100.0 DO I=2,9 ROD(I) = 0.0 ENDDO ROD(10)=0.0 DO TICKS=1,MAXTIME IF (MOD(TICKS,20) .EQ. 1) PRINT 100, TICKS, (ROD(I), I=1,10) ROD(2:9) = (ROD(1:8) + ROD(3:10))/2 ENDDO 100 FORMAT (I4,10F7.2) ENDCompiling and executing, we have output identical to the red-black ordering example, after replacing the loop with the single statement
ROD(2:9) = (ROD(1:8) + ROD(3:10))/2
f90 -Xlist -parallel p283heatrod.f90 -o p283heatrod f90: Warning: Optimizer level changed from 0 to 3 to support parallelized code rohan.stewart:~/cs575/spr01/diffusion> p283heatrod 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21 100.00 82.38 66.34 50.30 38.18 26.06 18.20 10.35 5.18 0.00 41 100.00 87.04 74.52 61.99 50.56 39.13 28.94 18.75 9.38 0.00 61 100.00 88.36 76.84 65.32 54.12 42.91 32.07 21.22 10.61 0.00 81 100.00 88.74 77.51 66.28 55.14 44.00 32.97 21.93 10.97 0.00 101 100.00 88.84 77.70 66.55 55.44 44.32 33.23 22.14 11.07 0.00 121 100.00 88.88 77.76 66.63 55.52 44.41 33.30 22.20 11.10 0.00 141 100.00 88.89 77.77 66.66 55.55 44.43 33.32 22.22 11.11 0.00 161 100.00 88.89 77.78 66.66 55.55 44.44 33.33 22.22 11.11 0.00 181 100.00 88.89 77.78 66.67 55.55 44.44 33.33 22.22 11.11 0.00
"The good news is that both FORTRAN 90 and HPF provide one road map to portable scalable computing that doesn't require explicit message passing. the only question is which road we users will choose."