Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

High Performance Computing - Charles Severance [95]

By Root 1348 0

calls are independent:

C$ASSERT NO_SIDE_EFFECTS

DO I=1,N

CALL BIGSTUFF (A,B,C,I,J,K)

END DO

Even if the compiler has all the source code, use of common variables or equivalences may mask call independence.

Manual Parallelism

At some point, you get tired of giving the compiler advice and hoping that it will reach the conclusion to parallelize your loop. At that point you move into the realm of manual parallelism. Luckily the programming model provided in FORTRAN insulates you from much of the details of exactly how multiple threads are managed at runtime. You generally control explicit parallelism by adding specially formatted comment lines to your source code. There are a wide variety of formats of these directives. In this section, we use the syntax that is part of the OpenMP (see http://cnx.org/content/m32814/1.3/www.openmp.org) standard. You generally find similar capabilities in each of the vendor compilers. The precise syntax varies slightly from vendor to vendor. (That alone is a good reason to have a standard.)

The basic programming model is that you are executing a section of code with either a single thread or multiple threads. The programmer adds a directive to summon additional threads at various points in the code. The most basic construct is called the parallel region.

Parallel regions

In a parallel region, the threads simply appear between two statements of straight-line code. A very trivial example might be the following using the OpenMP directive syntax:

PROGRAM ONE

EXTERNAL OMP_GET_THREAD_NUM, OMP_GET_MAX_THREADS

INTEGER OMP_GET_THREAD_NUM, OMP_GET_MAX_THREADS

IGLOB = OMP_GET_MAX_THREADS()

PRINT *,’Hello There’

C$OMP PARALLEL PRIVATE(IAM), SHARED(IGLOB)

IAM = OMP_GET_THREAD_NUM()

PRINT *, ’I am ’, IAM, ’ of ’, IGLOB

C$OMP END PARALLEL

PRINT *,’All Done’

END

The C$OMP is the sentinel that indicates that this is a directive and not just another comment. The output of the program when run looks as follows:

% setenv OMP_NUM_THREADS 4

% a.out

Hello There

I am 0 of 4

I am 3 of 4

I am 1 of 4

I am 2 of 4

All Done

Execution begins with a single thread. As the program encounters the PARALLEL directive, the other threads are activated to join the computation. So in a sense, as execution passes the first directive, one thread becomes four. Four threads execute the two statements between the directives. As the threads are executing independently, the order in which the print statements are displayed is somewhat random. The threads wait at the END PARALLEL directive until all threads have arrived. Once all threads have completed the parallel region, a single thread continues executing the remainder of the program.

In Figure 3.22, the PRIVATE(IAM) indicates that the IAM variable is not shared across all the threads but instead, each thread has its own private version of the variable. The IGLOB variable is shared across all the threads. Any modification of IGLOB appears in all the other threads instantly, within the limitations of the cache coherency.

Figure 3.22. Data interactions during a parallel region

During the parallel region, the programmer typically divides the work among the threads. This pattern of going from single-threaded to multithreaded execution may be repeated many times throughout the execution of an application.

Because input and output are generally not thread-safe, to be completely correct, we should indicate that the print statement in the parallel section is only to be executed on one processor at any one time. We use a directive to indicate that this section of code is a critical section. A lock or other synchronization mechanism ensures that no more than one processor is executing the statements in the critical section at any one time:

C$OMP CRITICAL

PRINT *, ’I am ’, IAM, ’ of ’, IGLOB

C$OMP END CRITICAL

Parallel loops

Quite often the areas of the code that are most valuable to execute in parallel are loops. Consider the following loop:

DO I=1,1000000

TMP1 = ( A(I) ** 2 ) + ( B(I) ** 2 )

TMP2 = SQRT(TMP1)

B(I)

Online Book Reader

High Performance Computing - Charles Severance [95]

®Online Book Reader