Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

High Performance Computing - Charles Severance [90]

By Root 1297 0

compatibility with workstations, large memories, high throughput, large shared memories, fast I/O, and many others. While these systems are strong in multiprogrammed server roles, they are also an affordable high performance computing resource for many organizations. Their cache-coherent shared-memory model allows multithreaded applications to be easily developed.

We have also examined some of the software paradigms that must be used to develop multithreaded applications. While you hopefully will never have to write C code with explicit threads like the examples in this chapter, it is nice to understand the fundamental operations at work on these multiprocessor systems. Using the FORTRAN language with an automatic parallelizing compiler, we have the advantage that these and many more details are left to the FORTRAN compiler and runtime library. At some point, especially on the most advanced architectures, you may have to explicitly program a multithreaded program using the types of techniques shown in this chapter.

One trend that has been predicted for some time is that we will begin to see multiple cache-coherent CPUs on a single chip once the ability to increase the clock rate on a single chip slows down. Imagine that your new $2000 workstation has four 1-GHz processors on a single chip. Sounds like a good time to learn how to write multithreaded programs!

Exercises*

Exercise 3.14.1.

Experiment with the fork code in this chapter. Run the program multiple times and see how the order of the messages changes. Explain the results.

Exercise 3.14.2.

Experiment with the create1 and create3 codes in this chapter. Remove all of the sleep( ) calls. Execute the programs several times on single and multiprocessor systems. Can you explain why the output changes from run to run in some situations and doesn’t change in others?

Exercise 3.14.3.

Experiment with the parallel sum code in this chapter. In the SumFunc( ) routine, change the for-loop to:

for(i=start;i

Remove the three lines at the end that get the mutex and update the GlobSum. Execute the code. Explain the difference in values that you see for GlobSum. Are the patterns different on a single processor and a multiprocessor? Explain the performance impact on a single processor and a multiprocessor.

Exercise 3.14.4.

Explain how the following code segment could cause deadlock — two or more processes waiting for a resource that can’t be relinquished:

...

call lock (lword1)

call lock (lword2)

...

call unlock (lword1)

call unlock (lword2)

call lock (lword2)

call lock (lword1)

...

call unlock (lword2)

call unlock (lword1)

...

Exercise 3.14.5.

If you were to code the functionality of a spin-lock in C, it might look like this:

while (!lockword);

lockword = !lockword;

As you know from the first sections of the book, the same statements would be compiled into explicit loads and stores, a comparison, and a branch. There’s a danger that two processes could each load lockword, find it unset, and continue on as if they owned the lock (we have a race condition). This suggests that spin-locks are implemented differently — that they’re not merely the two lines of C above. How do you suppose they are implemented?

3.3. Programming Shared-Memory Multiprocessors

Introduction*

In The Section Called “Introduction”, we examined the hardware used to implement shared-memory parallel processors and the software environment for a programmer who is using threads explicitly. In this chapter, we view these processors from a simpler vantage point. When programming these systems in FORTRAN, you have the advantage of the compiler’s support of these systems. At the top end of ease of use, we can simply add a flag or two on the compilation of our well-written code, set an environment variable, and voilá, we are executing in parallel. If you want some more control, you can add directives to particular loops where you know better than the compiler how the loop should be executed.[60] First we examine

Online Book Reader

High Performance Computing - Charles Severance [90]

®Online Book Reader