High Performance Computing - Charles Severance [121]
CALL STORE(BLACK,ROWS,COLS,S,E,2*ROWS/3,2*TOTCOLS/3,20.0,INUM)
All of the processes set these values independently depending on which process has which strip of the overall array.
Now we exchange the data with our neighbors as determined by the Cartesian communicator. Note that we don’t need an IF test to determine if we are the far-left or far-right process. If we are at the edge, our neighbor setting is MPI_PROC_NULL and the MPI_SEND and MPI_RECV calls do nothing when given this as a source or destination value, thus saving us an IF test.
Note that we specify the communicator COMM1D because the rank values we are using in these calls are relative to that communicator:
* Send left and receive right
CALL MPI_SEND(BLACK(1,1),ROWS,MPI_DOUBLE_PRECISION,
+ LEFTPROC,1,COMM1D,IERR)
CALL MPI_RECV(BLACK(1,MYLEN+1),ROWS,MPI_DOUBLE_PRECISION,
+ RIGHTPROC,1,COMM1D,STATUS,IERR)
* Send Right and Receive left in a single statement
CALL MPI_SENDRECV(
+ BLACK(1,MYLEN),ROWS,COMM1D,RIGHTPROC,2,
+ BLACK(1,0),ROWS,COMM1D,LEFTPROC, 2,
+ MPI_COMM_WORLD, STATUS, IERR)
Just to show off, we use both the separate send and receive, and the combined send and receive. When given a choice, it’s probably a good idea to use the combined operations to give the runtime environment more flexibility in terms of buffering. One downside to this that occurs on a network of workstations (or any other high-latency interconnect) is that you can’t do both send operations first and then do both receive operations to overlap some of the communication delay.
Once we have all of our ghost points from our neighbors, we can perform the algorithm on our subset of the space:
* Perform the flow
DO C=1,MYLEN
DO R=1,ROWS
RED(R,C) = ( BLACK(R,C) +
+ BLACK(R,C-1) + BLACK(R-1,C) +
+ BLACK(R+1,C) + BLACK(R,C+1) ) / 5.0
ENDDO
ENDDO
* Copy back - Normally we would do a red and black version of the loop
DO C=1,MYLEN
DO R=1,ROWS
BLACK(R,C) = RED(R,C)
ENDDO
ENDDO
ENDDO
Again, for simplicity, we don’t do the complete red-black computation.[77] We have no synchronization at the bottom of the loop because the messages implicitly synchronize the processes at the top of the next loop.
Again, we dump out the data for verification. As in the PVM example, one good test of basic correctness is to make sure you get exactly the same results for varying numbers of processes:
* Dump out data for verification
IF ( ROWS .LE. 20 ) THEN
FNAME = ’/tmp/mheatcout.’ // CHAR(ICHAR(’0’)+INUM)
OPEN(UNIT=9,NAME=FNAME,FORM=’formatted’)
DO C=1,MYLEN
WRITE(9,100)(BLACK(R,C),R=1,ROWS)
100 FORMAT(20F12.6)
ENDDO
CLOSE(UNIT=9)
ENDIF
To terminate the program, we call MPI_FINALIZE:
* Lets all go together
CALL MPI_FINALIZE(IERR)
END
As in the PVM example, we need a routine to store a value into the proper strip of the global array. This routine simply checks to see if a particular global element is in this process and if so, computes the proper location within its strip for the value. If the global element is not in this process, this routine simply returns doing nothing:
SUBROUTINE STORE(RED,ROWS,COLS,S,E,R,C,VALUE,INUM)
REAL*8 RED(0:ROWS+1,0:COLS+1)
REAL VALUE
INTEGER ROWS,COLS,S,E,R,C,I,INUM
IF ( C .LT. S .OR. C .GT. E ) RETURN
I = ( C - S ) + 1
* PRINT *,’STORE, INUM,R,C,S,E,R,I’,INUM,R,C,S,E,R,I,VALUE RED(R,I) = VALUE
RETURN
END
When this program is executed, it has the following output:
% mpif77 -c mheatc.f mheatc.f:
MAIN mheatc:
store:
% mpif77 -o mheatc mheatc.o -lmpe
% mheatc -np 4
Calling MPI_INIT
Back from MPI_INIT
Back from MPI_INIT
Back from MPI_INIT
Back from MPI_INIT
0 4 0 -1 1 1 50
2 4 2 1 3 101 150
3 4 3 2 -1 151 200
1 4 1 0 2 51 100
%
As you can see, we call MPI_INIT to activate the four processes. The PRINT statement immediately after the MPI_INIT call appears four times, once for each of the activated processes. Then each process prints out the strip of the array it will process. We can also see the neighbors of each process including -1 when a process has no neighbor to the left or