High Performance Computing - Charles Severance [31]
All memory references are explicit load from or store to “temporaries” t n.
Logical values used in branches are calculated separately from the actual branch.
Jumps go to absolute addresses.
If we were building a compiler, we’d need to be a little more specific. For our purposes, this will do. Consider the following bit of C code:
while (j < n) {
k = k + j * 2;
m = j * 2;
j++;
}
This loop translates into the intermediate language representation shown here:
A:: t1 := j
t2 := n
t3 := t1 < t2
jmp (B) t3
jmp (C) TRUE
B:: t4 := k
t5 := j
t6 := t5 * 2
t7 := t4 + t6
k := t7
t8 := j
t9 := t8 * 2
m := t9
t10 := j
t11 := t10 + 1
j := t11
jmp (A) TRUE
C::
Each C source line is represented by several IL statements. On many RISC processors, our IL code is so close to machine language that we could turn it directly into object code.[19] Often the lowest optimization level does a literal translation from the intermediate language to machine code. When this is done, the code generally is very large and performs very poorly. Looking at it, you can see places to save a few instructions. For instance, j gets loaded into temporaries in four places; surely we can reduce that. We have to do some analysis and make some optimizations.
Basic Blocks
After generating our intermediate language, we want to cut it into basic blocks. These are code sequences that start with an instruction that either follows a branch or is itself a target for a branch. Put another way, each basic block has one entrance (at the top) and one exit (at the bottom). Figure 2.2 represents our IL code as a group of three basic blocks. Basic blocks make code easier to analyze. By restricting flow of control within a basic block from top to bottom and eliminating all the branches, we can be sure that if the first statement gets executed, the second one does too, and so on. Of course, the branches haven’t disappeared, but we have forced them outside the blocks in the form of the connecting arrows — the flow graph.
Figure 2.2. Intermediate language divided into basic blocks
We are now free to extract information from the blocks themselves. For instance, we can say with certainty which variables a given block uses and which variables it defines (sets the value of ). We might not be able to do that if the block contained a branch. We can also gather the same kind of information about the calculations it performs. After we have analyzed the blocks so that we know what goes in and what comes out, we can modify them to improve performance and just worry about the interaction between blocks.
Optimization Levels*
There are a wide variety of optimization techniques, and they are not all applicable in all situations. So the user is typically given some choices as to whether or not particular optimizations are performed. Often this is expressed in the form of an optimization level that is specified on the compiler as a command-line option such as –O3.
The different levels of optimization controlled by a compiler flag may include the following:
No optimization: Generates machine code directly from the intermediate language, which can be very large and slow code. The primary uses of no optimization are for debuggers and establishing the correct program output. Because every operation is done precisely as the user specified, it must be right.
Basic optimizations:Similar to those described in this chapter. They generally work to minimize the intermediate language and generate fast compact code.
Interprocedural analysis: Looks beyond the boundaries of a single routine for optimization opportunities. This optimization level might include extending a basic optimization such as copy propagation across multiple routines. Another result of this technique is procedure inlining where it will improve performance.
Runtime profile analysis: It is possible to use runtime profiles to help the compiler generate improved code based on its knowledge of the patterns of runtime execution gathered from profile information.
Floating-point