Online Book Reader

Home Category

High Performance Computing - Charles Severance [77]

By Root 1354 0
Practice by Thomas Pittman and James Peters (Prentice-Hall).


Exercises*

Exercise 3.7.1.

Identify the dependencies (if there are any) in the following loops. Can you think of ways to organize each loop for more parallelism?

DO I=1,N-2

A(I+2) = A(I) + 1.

ENDDO

DO I=1,N-1,2

A(I+1) = A(I) + 1.

ENDDO

DO I=2,N

A(I) = A(I-1) * 2.

B = A(I-1)

ENDDO

DO I=1,N

IF(N .GT. M)

A(I) = 1.

ENDDO

DO I=1,N

A(I,J) = A(I,K) + B

ENDDO

DO I=1,N-1

A(I+1,J) = A(I,K) + B

ENDDO

for (i=0; ia[i] = b[i];

Exercise 3.7.2.

Imagine that you are a parallelizing compiler, trying to generate code for the loop below. Why are references to A a challenge? Why would it help to know that K is equal to zero? Explain how you could partially vectorize the statements involving A if you knew that K had an absolute value of at least 8.

DO I=1,N

E(I,M) = E(I-1,M+1) - 1.0

B(I) = A(I+K) * C

A(I) = D(I) * 2.0

ENDDO

Exercise 3.7.3.

The following three statements contain a flow dependency, an antidependency and an output dependency. Can you identify each? Given that you are allowed to reorder the statements, can you find a permutation that produces the same values for the variables C and B? Show how you can reduce the dependencies by combining or rearranging calculations and using temporary variables.

B = A + C

B = C + D

C = B + D


3.2. Shared-Memory Multiprocessors


Introduction*

In the mid-1980s, shared-memory multiprocessors were pretty expensive and pretty rare. Now, as hardware costs are dropping, they are becoming commonplace. Many home computer systems in the under-$3000 range have a socket for a second CPU. Home computer operating systems are providing the capability to use more than one processor to improve system performance. Rather than specialized resources locked away in a central computing facility, these shared-memory processors are often viewed as a logical extension of the desktop. These systems run the same operating system (UNIX or NT) as the desktop and many of the same applications from a workstation will execute on these multiprocessor servers.

Typically a workstation will have from 1 to 4 processors and a server system will have 4 to 64 processors. Shared-memory multiprocessors have a significant advantage over other multiprocessors because all the processors share the same view of the memory, as shown in Figure 3.11.

These processors are also described as uniform memory access (also known as UMA) systems. This designation indicates that memory is equally accessible to all processors with the same performance.

The popularity of these systems is not due simply to the demand for high performance computing. These systems are excellent at providing high throughput for a multiprocessing load, and function effectively as high-performance database servers, network servers, and Internet servers. Within limits, their throughput is increased linearly as more processors are added.

In this book we are not so interested in the performance of database or Internet servers. That is too passé; buy more processors, get better throughput. We are interested in pure, raw, unadulterated compute speed for our high performance application. Instead of running hundreds of small jobs, we want to utilize all $750,000 worth of hardware for our single job.

The challenge is to find techniques that make a program that takes an hour to complete using one processor, complete in less than a minute using 64 processors. This is not trivial. Throughout this book so far, we have been on an endless quest for parallelism. In this and the remaining chapters, we will begin to see the payoff for all of your hard work and dedication!

The cost of a shared-memory multiprocessor can range from $4000 to $30 million. Some example systems include multiple-processor Intel systems from a wide range of vendors, SGI Power Challenge Series, HP/Convex C-Series, DEC AlphaServers, Cray vector/parallel processors, and Sun Enterprise systems. The SGI Origin 2000, HP/Convex Exemplar, Data General AV-20000,

Return Main Page Previous Page Next Page

®Online Book Reader