Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

High Performance Computing - Charles Severance [52]

By Root 1282 0

of caution with regard to procedure inlining. You can easily do too much of it. If everything and anything is ingested into the body of its parents, the resulting executable may be so large that it repeatedly spills out of the instruction cache and becomes a net performance loss. Our advice is that you use the caller/callee information profilers give you and make some intelligent decisions about inlining, rather than trying to inline every subroutine available. Again, small routines that are called often are generally the best candidates for inlining.

Branches*

People sometimes take a week to make a decision, so we can’t fault a computer if it takes a few tens of nanoseconds. However, if an if-statement appears in some heavily traveled section of the code, you might get tired of the delay. There are two basic approaches to reducing the impact of branches:

Streamline them.

Move them out to the computational suburbs. Particularly, get them out of loops.

In The Section Called “Branches With Loops” we show you some easy ways to reorganize conditionals so they execute more quickly.

Branches With Loops*

Numerical codes usually spend most of their time in loops, so you don’t want anything inside a loop that doesn’t have to be there, especially an if-statement. Not only do if-statements gum up the works with extra instructions, they can force a strict order on the iterations of a loop. Of course, you can’t always avoid conditionals. Sometimes, though, people place them in loops to process events that could have been handled outside, or even ignored.

To take you back a few years, the following code shows a loop with a test for a value close to zero:

PARAMETER (SMALL = 1.E-20)

DO I=1,N

IF (ABS(A(I)) .GE. SMALL) THEN

B(I) = B(I) + A(I) * C

ENDIF

ENDDO

The idea was that if the multiplier, A(I), were reasonably small, there would be no reason to perform the math in the center of the loop. Because floating-point operations weren’t pipelined on many machines, a comparison and a branch was cheaper; the test would save time. On an older CISC or early RISC processor, a comparison and branch is probably still a savings. But on other architectures, it costs a lot less to just perform the math and skip the test. Eliminating the branch eliminates a control dependency and allows the compiler to pipeline more arithmetic operations. Of course, the answer could change slightly if the test is eliminated. It then becomes a question of whether the difference is significant. Here’s another example where a branch isn’t necessary. The loop finds the absolute value of each element in an array:

DO I=1,N

IF (A(I) .LT. 0.) A(I) = -A(I)

ENDDO

But why perform the test at all? On most machines, it’s quicker to perform the abs() operation on every element of the array.

We do have to give you a warning, though: if you are coding in C, the absolute value, fabs(), is a subroutine call. In this particular case, you are better off leaving the conditional in the loop.[39]

When you can’t always throw out the conditional, there are things you can do to minimize negative performance. First, we have to learn to recognize which conditionals within loops can be restructured and which cannot. Conditionals in loops fall into several categories:

Loop invariant conditionals

Loop index dependent conditionals

Independent loop conditionals

Dependent loop conditionals

Reductions

Conditionals that transfer control

Let’s look at these types in turn.

Loop Invariant Conditionals

The following loop contains an invariant test:

DO I=1,K

IF (N .EQ. 0) THEN

A(I) = A(I) + B(I) * C

ELSE

A(I) = 0.

ENDIF

ENDDO

“Invariant” means that the outcome is always the same. Regardless of what happens to the variables A, B, C, and I, the value of N won’t change, so neither will the outcome of the test.

You can recast the loop by making the test outside and replicating the loop body twice — once for when the test is true, and once for when it is false, as in the following example:

IF (N .EQ. 0) THEN

DO I=1,K

A(I) =

Online Book Reader

High Performance Computing - Charles Severance [52]

®Online Book Reader