High Performance Computing - Charles Severance [35]
Induction variable simplification probably wouldn’t be a very important optimization, except that array address calculations look very much like the calculation for K in the example above. For instance, the address calculation for A(I) within a loop iterating on the variable I looks like this:
address = base_address(A) + (I-1) * sizeof_datatype(A)
Performing all that math is unnecessary. The compiler can create a new induction variable for references to A and simplify the address calculations:
outside the loop...
address = base_address(A) - (1 * sizeof_datatype(A))
indie the loop...
address = address + sizeof_datatype(A)
Induction variable simplification is especially useful on processors that can automatically increment a register each time it is used as a pointer for a memory reference. While stepping through a loop, the memory reference and the address arithmetic can both be squeezed into a single instruction—a great savings.
Object Code Generation
Precompilation, lexical analysis, parsing, and many optimization techniques are somewhat portable, but code generation is very specific to the target processor. In some ways this phase is where compilers earn their keep on single-processor RISC systems.
Anything that isn’t handled in hardware has to be addressed in software. That means if the processor can’t resolve resource conflicts, such as overuse of a register or pipeline, then the compiler is going to have to take care of it. Allowing the compiler to take care of it isn’t necessarily a bad thing — it’s a design decision. A complicated compiler and simple, fast hardware might be cost effective for certain applications. Two processors at opposite ends of this spectrum are the MIPS R2000 and the HP PA-8000. The first depends heavily on the compiler to schedule instructions and fairly distribute resources. The second manages both things at runtime, though both depend on the compiler to provide a balanced instruction mix.
In all computers, register selection is a challenge because, in spite of their numbers, registers are precious. You want to be sure that the most active variables become register resident at the expense of others. On machines without register renaming (see The Section Called “Introduction”), you have to be sure that the compiler doesn’t try to recycle registers too quickly, otherwise the processor has to delay computations as it waits for one to be freed.
Some instructions in the repertoire also save your compiler from having to issue others. Examples are auto-increment for registers being used as array indices or conditional assignments in lieu of branches. These both save the processor from extra calculations and make the instruction stream more compact.
Lastly, there are opportunities for increased parallelism. Programmers generally think serially, specifying steps in logical succession. Unfortunately, serial source code makes serial object code. A compiler that hopes to efficiently use the parallelism of the processor will have to be able to move instructions around and find operations that can be issued side by side. This is one of the biggest challenges for compiler writers today. As superscalar and very long instruction word (VLIW) designs become capable of executing more instructions per clock cycle, the compiler will have to dig deeper for operations that can execute at the same time.
Closing Notes*
This chapter has been a basic introduction into how an optimizing compiler operates. However, this is not the last we will talk about compilers. In order to perform the automatic vectorization, parallelization, and data decomposition, compilers must further analyze the source code. As we encounter these topics, we will discuss the compiler impacts and how programmers can best interact with compilers.
For single-processor modern RISC architectures,