High Performance Computing - Charles Severance [53]
ENDDO
ELSE
DO I=1,K
A(I) = 0
ENDDO
ENDIF
The effect on the runtime is dramatic. Not only have we eliminated K-1 copies of the test, we have also assured that the computations in the middle of the loop are not control-dependent on the if-statement, and are therefore much easier for the compiler to pipeline.
We remember helping someone optimize a program with loops containing similar conditionals. They were checking to see whether debug output should be printed each iteration inside an otherwise highly optimizable loop. We can’t fault the person for not realizing how much this slowed the program down. Performance wasn’t important at the time. The programmer was just trying to get the code to produce good answers. But later on, when performance mattered, by cleaning up invariant conditionals, we were able to speed up the program by a factor of 100.
Loop Index Dependent Conditionals
For loop index dependent conditionals, the test is true for certain ranges of the loop index variables. It isn’t always true or always false, like the conditional we just looked at, but it does change with a predictable pattern, and one that we can use to our advantage. The following loop has two index variables, I and J.
DO I=1,N
DO J=1,N
IF (J .LT. I)
A(J,I) = A(J,I) + B(J,I) * C
ELSE
A(J,I) = 0.0
ENDIF
ENDDO
ENDDO
Notice how the if-statement partitions the iterations into distinct sets: those for which it is true and those for which it is false. You can take advantage of the predictability of the test to restructure the loop into several loops — each custom-made for a different partition:
DO I=1,N
DO J=1,I-1
A(J,I) = A(J,I) + B(J,I) * C
ENDDO
DO J=I,N
A(J,I) = 0.0
ENDDO
ENDDO
The new version will almost always be faster. A possible exception is when N is a small value, like 3, in which case we have created more clutter. But then, the loop probably has such a small impact on the total runtime that it won’t matter which way it’s coded.
Independent Loop Conditionals
It would be nice if you could optimize every loop by partitioning it. But more often than not, the conditional doesn’t directly depend on the value of the index variables. Although an index variable may be involved in addressing an array, it doesn’t create a recognizable pattern in advance — at least not one you can see when you are writing the program. Here’s such a loop:
DO I=1,N
DO J=1,N
IF (B(J,I) .GT. 1.0) A(J,I) = A(J,I) + B(J,I) * C
ENDDO
ENDDO
There is not much you can do about this type of conditional. But because every iteration is independent, the loop can be unrolled or can be performed in parallel.
Dependent Loop Conditionals
When the conditional is based on a value that changes with each iteration of the loop, the compiler has no choice but to execute the code exactly as written. For instance, the following loop has an if-statement with built-in scalar recursion:
DO I=1,N
IF (X .LT. A(I)) X = X + B(I)*2.
ENDDO
You can’t know which way the branch will go for the next iteration until you are done with the current iteration. To recognize the dependency, try to unroll the loop slightly by hand. If you can’t start the second test until the first has finished, you have a dependent loop conditional. You may want to look at these types of loops to see if you can eliminate the iteration-to-iteration value.
Reductions
Keep an eye out for loops in which the if-statement is performing a max or min function on a array. This is a reduction, so called because it reduces a array to a scalar result (the previous example was a reduction too, by the way). Again, we are getting a little bit ahead of ourselves, but since we are talking about if-statements in loops, I want to introduce a trick for restructuring reductions max and min to expose more parallelism. The following loop searches for the maximum value, z, in the array a by going through the elements one at a time:
for (i=0; i As written, it’s recursive like the loop from the previous section. You need the