Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

High Performance Computing - Charles Severance [70]

By Root 1346 0

program. Flow of control enters at the top and goes through two branch decisions. Furthermore, say that there is a square root operation at the entry point, and that the flow of control almost always goes from the top, down to the leg containing the statement A=0.0. This means that the results of the calculation A=SQRT(B) are almost always discarded because A gets a new value of 0.0 each time through. A square root operation is always “expensive” because it takes a lot of time to execute. The trouble is that you can’t just get rid of it; occasionally it’s needed. However, you could move it out of the way and continue to observe the control dependencies by making two copies of the square root operation along the less traveled branches, as shown in Figure 3.3. This way the SQRT would execute only along those paths where it was actually needed.

Figure 3.1. Control dependency

Figure 3.2. A little section of your program

This kind of instruction scheduling will be appearing in compilers (and even hardware) more and more as time goes on. A variation on this technique is to calculate results that might be needed at times when there is a gap in the instruction stream (because of dependencies), thus using some spare cycles that might otherwise be wasted.

Figure 3.3. Expensive operation moved so that it’s rarely executed

Data Dependencies

A calculation that is in some way bound to a previous calculation is said to be data dependent upon that calculation. In the code below, the value of B is data dependent on the value of A. That’s because you can’t calculate B until the value of A is available:

A = X + Y + COS(Z)

B = A * C

This dependency is easy to recognize, but others are not so simple. At other times, you must be careful not to rewrite a variable with a new value before every other computation has finished using the old value. We can group all data dependencies into three categories: (1) flow dependencies, (2) antidependencies, and (3) output dependencies. Figure 3.4 contains some simple examples to demonstrate each type of dependency. In each example, we use an arrow that starts at the source of the dependency and ends at the statement that must be delayed by the dependency. The key problem in each of these dependencies is that the second statement can’t execute until the first has completed. Obviously in the particular output dependency example, the first computation is dead code and can be eliminated unless there is some intervening code that needs the values. There are other techniques to eliminate either output or antidependencies. The following example contains a flow dependency followed by an output dependency.

Figure 3.4. Types of data dependencies

X = A / B

Y = X + 2.0

X = D - E

While we can’t eliminate the flow dependency, the output dependency can be eliminated by using a scratch variable:

Xtemp = A/B

Y = Xtemp + 2.0

X = D - E

As the number of statements and the interactions between those statements increase, we need a better way to identify and process these dependencies. Figure 3.5 shows four statements with four dependencies.

Figure 3.5. Multiple dependencies

None of the second through fourth instructions can be started before the first instruction completes.

Forming a DAG

One method for analyzing a sequence of instructions is to organize it into a directed acyclic graph (DAG).[50] Like the instructions it represents, a DAG describes all of the calculations and relationships between variables. The data flow within a DAG proceeds in one direction; most often a DAG is constructed from top to bottom. Identifiers and constants are placed at the “leaf ” nodes — the ones on the top. Operations, possibly with variable names attached, make up the internal nodes. Variables appear in their final states at the bottom. The DAG’s edges order the relationships between the variables and operations within it. All data flow proceeds from top to bottom.

To construct a DAG, the compiler takes each intermediate language tuple and maps it onto one or more nodes. For

Online Book Reader

High Performance Computing - Charles Severance [70]

®Online Book Reader