Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

High Performance Computing - Charles Severance [131]

By Root 1364 0

We could eliminate the problem by requiring that all instructions be the same length, and that there be a limited number of instruction formats as shown in Figure 5.5. This way, every instruction entering the pipeline is known a priori to be complete — not needing another memory access. It would also be easier for the processor to locate the instruction fields that specify registers or constants. Altogether because RISC can assume a fixed instruction length, the pipeline flows much more smoothly.

Figure 5.5. Variable-length CISC versus fixed-length RISC instructions

Delayed Branches

As described earlier, branches are a significant problem in a pipelined architecture. Rather than take a penalty for cleaning out the pipeline after a misguessed branch, many RISC designs require an instruction after the branch. This instruction, in what is called the branch delay slot, is executed no matter what way the branch goes. An instruction in this position should be useful, or at least harmless, whichever way the branch proceeds. That is, you expect the processor to execute the instruction following the branch in either case, and plan for it. In a pinch, a no-op can be used. A slight variation would be to give the processor the ability to annul (or squash) the instruction appearing in the branch delay slot if it turns out that it shouldn’t have been issued after all:

ADD R1,R2,R1 add r1 to r2 and store in r1

SUB R3,R1,R3 subtract r1 from r3, store in r3

BRA SOMEWHERE branch somewhere else

LABEL1 ZERO R3 instruction in branch delay slot

...

While branch delay slots appeared to be a very clever solution to eliminating pipeline stalls associated with branch operations, as processors moved toward exe- cuting two and four instructions simultaneously, another approach was needed.[84]

A more robust way of eliminating pipeline stalls was to “predict” the direction of the branch using a table stored in the decode unit. As part of the decode stage, the CPU would notice that the instruction was a branch and consult a table that kept the recent behavior of the branch; it would then make a guess. Based on the guess, the CPU would immediately begin fetching at the predicted location. As long as the guesses were correct, branches cost exactly the same as any other instruction.

If the prediction was wrong, the instructions that were in process had to be can- celled, resulting in wasted time and effort. A simple branch prediction scheme is typically correct well over 90% of the time, significantly reducing the overall negative performance impact of pipeline stalls due to branches. All recent RISC designs incorporate some type of branch prediction, making branch delay slots effectively unnecessary.

Another mechanism for reducing branch penalties is conditional execution. These are instructions that look like branches in source code, but turn out to be a special type of instruction in the object code. They are very useful because they replace test and branch sequences altogether. The following lines of code capture the sense of a conditional branch:

IF ( B < C ) THEN

A = D

ELSE

A = E

ENDIF

Using branches, this would require at least two branches to ensure that the proper value ended up in A. Using conditional execution, one might generate code that looks as follows:

COMPARE B < C

IF TRUE A = D conditional instruction

IF FALSE A = E conditional instruction

This is a sequence of three instructions with no branches. One of the two assignments executes, and the other acts as a no-op. No branch prediction is needed, and the pipeline operates perfectly. There is a cost to taking this approach when there are a large number of instructions in one or the other branch paths that would seldom get executed using the traditional branch instruction model.

Load/Store Architecture

In a load/store instruction set architecture, memory references are limited to explicit load and store instructions. Each instruction may not make more than one memory reference per instruction. In a CISC processor, arithmetic and logical instructions

Online Book Reader

High Performance Computing - Charles Severance [131]

®Online Book Reader