Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

High Performance Computing - Charles Severance [121]

By Root 1309 0

CALL STORE(BLACK,ROWS,COLS,S,E,2*ROWS/3,2*TOTCOLS/3,20.0,INUM)

All of the processes set these values independently depending on which process has which strip of the overall array.

Now we exchange the data with our neighbors as determined by the Cartesian communicator. Note that we don’t need an IF test to determine if we are the far-left or far-right process. If we are at the edge, our neighbor setting is MPI_PROC_NULL and the MPI_SEND and MPI_RECV calls do nothing when given this as a source or destination value, thus saving us an IF test.

Note that we specify the communicator COMM1D because the rank values we are using in these calls are relative to that communicator:

* Send left and receive right

CALL MPI_SEND(BLACK(1,1),ROWS,MPI_DOUBLE_PRECISION,

+ LEFTPROC,1,COMM1D,IERR)

CALL MPI_RECV(BLACK(1,MYLEN+1),ROWS,MPI_DOUBLE_PRECISION,

+ RIGHTPROC,1,COMM1D,STATUS,IERR)

* Send Right and Receive left in a single statement

CALL MPI_SENDRECV(

+ BLACK(1,MYLEN),ROWS,COMM1D,RIGHTPROC,2,

+ BLACK(1,0),ROWS,COMM1D,LEFTPROC, 2,

+ MPI_COMM_WORLD, STATUS, IERR)

Just to show off, we use both the separate send and receive, and the combined send and receive. When given a choice, it’s probably a good idea to use the combined operations to give the runtime environment more flexibility in terms of buffering. One downside to this that occurs on a network of workstations (or any other high-latency interconnect) is that you can’t do both send operations first and then do both receive operations to overlap some of the communication delay.

Once we have all of our ghost points from our neighbors, we can perform the algorithm on our subset of the space:

* Perform the flow

DO C=1,MYLEN

DO R=1,ROWS

RED(R,C) = ( BLACK(R,C) +

+ BLACK(R,C-1) + BLACK(R-1,C) +

+ BLACK(R+1,C) + BLACK(R,C+1) ) / 5.0

ENDDO

* Copy back - Normally we would do a red and black version of the loop

DO C=1,MYLEN

DO R=1,ROWS

BLACK(R,C) = RED(R,C)

ENDDO

Again, for simplicity, we don’t do the complete red-black computation.[77] We have no synchronization at the bottom of the loop because the messages implicitly synchronize the processes at the top of the next loop.

Again, we dump out the data for verification. As in the PVM example, one good test of basic correctness is to make sure you get exactly the same results for varying numbers of processes:

* Dump out data for verification

IF ( ROWS .LE. 20 ) THEN

FNAME = ’/tmp/mheatcout.’ // CHAR(ICHAR(’0’)+INUM)

OPEN(UNIT=9,NAME=FNAME,FORM=’formatted’)

DO C=1,MYLEN

WRITE(9,100)(BLACK(R,C),R=1,ROWS)

100 FORMAT(20F12.6)

ENDDO

CLOSE(UNIT=9)

ENDIF

To terminate the program, we call MPI_FINALIZE:

* Lets all go together

CALL MPI_FINALIZE(IERR)

END

As in the PVM example, we need a routine to store a value into the proper strip of the global array. This routine simply checks to see if a particular global element is in this process and if so, computes the proper location within its strip for the value. If the global element is not in this process, this routine simply returns doing nothing:

SUBROUTINE STORE(RED,ROWS,COLS,S,E,R,C,VALUE,INUM)

REAL*8 RED(0:ROWS+1,0:COLS+1)

REAL VALUE

INTEGER ROWS,COLS,S,E,R,C,I,INUM

IF ( C .LT. S .OR. C .GT. E ) RETURN

I = ( C - S ) + 1

* PRINT *,’STORE, INUM,R,C,S,E,R,I’,INUM,R,C,S,E,R,I,VALUE RED(R,I) = VALUE

RETURN

END

When this program is executed, it has the following output:

% mpif77 -c mheatc.f mheatc.f:

MAIN mheatc:

store:

% mpif77 -o mheatc mheatc.o -lmpe

% mheatc -np 4

Calling MPI_INIT

Back from MPI_INIT

0 4 0 -1 1 1 50

2 4 2 1 3 101 150

3 4 3 2 -1 151 200

1 4 1 0 2 51 100

As you can see, we call MPI_INIT to activate the four processes. The PRINT statement immediately after the MPI_INIT call appears four times, once for each of the activated processes. Then each process prints out the strip of the array it will process. We can also see the neighbors of each process including -1 when a process has no neighbor to the left or

Online Book Reader

High Performance Computing - Charles Severance [121]

®Online Book Reader