High Performance Computing - Charles Severance [52]
Branches*
People sometimes take a week to make a decision, so we can’t fault a computer if it takes a few tens of nanoseconds. However, if an if-statement appears in some heavily traveled section of the code, you might get tired of the delay. There are two basic approaches to reducing the impact of branches:
Streamline them.
Move them out to the computational suburbs. Particularly, get them out of loops.
In The Section Called “Branches With Loops” we show you some easy ways to reorganize conditionals so they execute more quickly.
Branches With Loops*
Numerical codes usually spend most of their time in loops, so you don’t want anything inside a loop that doesn’t have to be there, especially an if-statement. Not only do if-statements gum up the works with extra instructions, they can force a strict order on the iterations of a loop. Of course, you can’t always avoid conditionals. Sometimes, though, people place them in loops to process events that could have been handled outside, or even ignored.
To take you back a few years, the following code shows a loop with a test for a value close to zero:
PARAMETER (SMALL = 1.E-20)
DO I=1,N
IF (ABS(A(I)) .GE. SMALL) THEN
B(I) = B(I) + A(I) * C
ENDIF
ENDDO
The idea was that if the multiplier, A(I), were reasonably small, there would be no reason to perform the math in the center of the loop. Because floating-point operations weren’t pipelined on many machines, a comparison and a branch was cheaper; the test would save time. On an older CISC or early RISC processor, a comparison and branch is probably still a savings. But on other architectures, it costs a lot less to just perform the math and skip the test. Eliminating the branch eliminates a control dependency and allows the compiler to pipeline more arithmetic operations. Of course, the answer could change slightly if the test is eliminated. It then becomes a question of whether the difference is significant. Here’s another example where a branch isn’t necessary. The loop finds the absolute value of each element in an array:
DO I=1,N
IF (A(I) .LT. 0.) A(I) = -A(I)
ENDDO
But why perform the test at all? On most machines, it’s quicker to perform the abs() operation on every element of the array.
We do have to give you a warning, though: if you are coding in C, the absolute value, fabs(), is a subroutine call. In this particular case, you are better off leaving the conditional in the loop.[39]
When you can’t always throw out the conditional, there are things you can do to minimize negative performance. First, we have to learn to recognize which conditionals within loops can be restructured and which cannot. Conditionals in loops fall into several categories:
Loop invariant conditionals
Loop index dependent conditionals
Independent loop conditionals
Dependent loop conditionals
Reductions
Conditionals that transfer control
Let’s look at these types in turn.
Loop Invariant Conditionals
The following loop contains an invariant test:
DO I=1,K
IF (N .EQ. 0) THEN
A(I) = A(I) + B(I) * C
ELSE
A(I) = 0.
ENDIF
ENDDO
“Invariant” means that the outcome is always the same. Regardless of what happens to the variables A, B, C, and I, the value of N won’t change, so neither will the outcome of the test.
You can recast the loop by making the test outside and replicating the loop body twice — once for when the test is true, and once for when it is false, as in the following example:
IF (N .EQ. 0) THEN
DO I=1,K
A(I) =