High Performance Computing - Charles Severance [100]
This chapter discusses the programming languages that are used on the largest parallel processing systems. Usually when you are faced with porting and tuning your code on a new scalable architecture architecture, you have to sit back and think about your application for a moment. Sometimes fundamental changes to your algorithm are needed before you can begin to work on the new architecture. Don't be surprised if you need to rewrite all or portions of the application in one of these languages. Modifications on one system may not give a performance benefit on another system. But if the application is important enough, it's worth the effort to improve its performance.
In this chapter, we cover:
FORTRAN 90
HPF: High Performance FORTRAN
These languages are designed for use on high-end computing systems. We will follow a simple program through each of these languages, using a simple finite-difference computation that roughly models heat flow. It's a classic problem that contains a great deal of parallelism and is easily solved on a wide variety of parallel architectures.
We introduce and discuss the concept of single program multiple data (SPMD) in that we treat MIMD computers as SIMD computers. We write our applications as if a large SIMD system were going to solve the problem. Instead of actually using a SIMD system, the resulting application is compiled for a MIMD system. The implicit synchronization of the SIMD systems is replaced by explicit synchronization at runtime on the MIMD systems.
Data-Parallel Problem: Heat Flow*
A classic problem that explores scalable parallel processing is the heat flow problem. The physics behind this problem lie in partial differential equations.
We will start with a one-dimensional metal plate (also known as a rod), and move to a two-dimensional plate in later examples. We start with a rod that is at zero degrees celsius. Then we place one end in 100 degree steam and the other end in zero degree ice. We want to simulate how the heat flows from one end to another. And the resulting temperatures along points on the metal rod after the temperature has stabilized.
To do this we break the rod into 10 segments and track the temperature over time for each segment. Intuitively, within a time step, the next temperature of a portion of the plate is an average of the surrounding temperatures. Given fixed temperatures at some points in the rod, the temperatures eventually converge to a steady state after sufficient time steps. Figure 4.1 shows the setup at the beginning of the simulation.
Figure 4.1. Heat flow in a rod
A simplistic implementation of this is as follows:
PROGRAM HEATROD
PARAMETER(MAXTIME=200)
INTEGER TICKS,I,MAXTIME
REAL*4 ROD(10)
ROD(1) = 100.0
DO I=2,9
ROD(I) = 0.0
ENDDO
ROD(10) = 0.0
DO TICKS=1,MAXTIME
IF ( MOD(TICKS,20) .EQ. 1 ) PRINT 100,TICKS,(ROD(I),I=1,10)
DO I=2,9
ROD(I) = (ROD(I-1) + ROD(I+1) ) / 2
ENDDO
ENDDO
100 FORMAT(I4,10F7.2)
END
The output of this program is as follows:
% f77 heatrod.f
heatrod.f:
MAIN heatrod:
% a.out
1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
21 100.00 87.04 74.52 62.54 51.15 40.30 29.91 19.83 9.92 0.00
41 100.00 88.74 77.51 66.32 55.19 44.10 33.05 22.02 11.01 0.00
61 100.00 88.88 77.76 66.64 55.53 44.42 33.31 22.21 11.10 0.00
81 100.00 88.89 77.78 66.66 55.55 44.44 33.33 22.22 11.11 0.00
101 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00
121 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00
141 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00
161 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00
181 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00
%
Clearly, by Time step 101, the simulation has converged to two decimal places of accuracy as the numbers have stopped changing. This should be the steady-state approximation of the temperature at the center of each segment of the bar.
Now, at this point, astute readers are saying to themselves, "Um, don't look now, but that loop has a flow dependency." You would also claim