Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

High Performance Computing - Charles Severance [123]

By Root 1270 0

+ 0,MPI_COMM_WORLD,IERR)

Now we perform the subset computation on each process. Note that we are using global coordinates because the array has the same shape on each of the processes. All we need to do is make sure we set up our particular strip of columns according to S and E:

* Perform the flow on our subset

DO C=S,E

DO R=1,ROWS

RED(R,C) = ( BLACK(R,C) +

+ BLACK(R,C-1) + BLACK(R-1,C) +

+ BLACK(R+1,C) + BLACK(R,C+1) ) / 5.0

ENDDO

Now we need to gather the appropriate strips from the processes into the appropriate strip in the master array for rebroadcast in the next time step. We could change the loop in the master to receive the messages in any order and check the STATUS variable to see which strip it received:

* Gather back up into the BLACK array in master (INUM = 0)

IF ( INUM .EQ. 0 ) THEN

DO C=S,E

DO R=1,ROWS

BLACK(R,C) = RED(R,C)

ENDDO

DO I=1,NPROC-1

CALL MPE_DECOMP1D(COLS, NPROC, I, LS, LE, IERR)

MYLEN = ( LE - LS ) + 1

SRC = I TAG = 0

CALL MPI_RECV(BLACK(0,LS),MYLEN*(ROWS+2),

+ MPI_DOUBLE_PRECISION, SRC, TAG,

+ MPI_COMM_WORLD, STATUS, IERR)

* Print *,’Recv’,I,MYLEN

ENDDO

ELSE

MYLEN = ( E - S ) + 1

DEST = 0

TAG = 0

CALL MPI_SEND(RED(0,S),MYLEN*(ROWS+2),MPI_DOUBLE_PRECISION,

+ DEST, TAG, MPI_COMM_WORLD, IERR)

Print *,’Send’,INUM,MYLEN

ENDIF

ENDDO

We use MPE_DECOMP1D to determine which strip we’re receiving from each process.

In some applications, the value that must be gathered is a sum or another single value. To accomplish this, you can use one of the MPI reduction routines that coalesce a set of distributed values into a single value using a single call.

Again at the end, we dump out the data for testing. However, since it has all been gathered back onto the master process, we only need to dump it on one process:

* Dump out data for verification

IF ( INUM .EQ.0 .AND. ROWS .LE. 20 ) THEN

FNAME = ’/tmp/mheatout’

OPEN(UNIT=9,NAME=FNAME,FORM=’formatted’)

DO C=1,COLS

WRITE(9,100)(BLACK(R,C),R=1,ROWS)

100 FORMAT(20F12.6)

ENDDO

CLOSE(UNIT=9)

ENDIF

CALL MPI_FINALIZE(IERR)

END

When this program executes with four processes, it produces the following output:

% mpif77 -c mheat.f

mheat.f:

MAIN mheat:

% mpif77 -o mheat mheat.o -lmpe

% mheat -np 4

Calling MPI_INIT

My Share 1 4 51 100

My Share 0 4 1 50

My Share 3 4 151 200

My Share 2 4 101 150

The ranks of the processes and the subsets of the computations for each process are shown in the output.

So that is a somewhat contrived example of the broadcast/gather approach to parallelizing an application. If the data structures are the right size and the amount of computation relative to communication is appropriate, this can be a very effective approach that may require the smallest number of code modifications compared to a single-processor version of the code.

MPI Summary

Whether you chose PVM or MPI depends on which library the vendor of your system prefers. Sometimes MPI is the better choice because it contains the newest features, such as support for hardware-supported multicast or broadcast, that can significantly improve the overall performance of a scatter-gather application.

A good text on MPI is Using MPI — Portable Parallel Programming with the Message-Passing Interface, by William Gropp, Ewing Lusk, and Anthony Skjellum (MIT Press). You may also want to retrieve and print the MPI specification from http://www.netlib.org/mpi/.

Closing Notes*

In this chapter we have looked at the “assembly language” of parallel programming. While it can seem daunting to rethink your application, there are often some simple changes you can make to port your code to message passing. Depending on the application, a master-slave, broadcast-gather, or decomposed data approach might be most appropriate.

It’s important to realize that some applications just don’t decompose into message passing very well. You may be working with just such an application. Once you have some experience with message passing, it becomes easier to identify the critical points where data must be communicated

Online Book Reader

High Performance Computing - Charles Severance [123]

®Online Book Reader