High Performance Computing - Charles Severance [26]
Use REAL*8 for computations unless you are sure REAL*4 has sufficient precision. Given that REAL*4 has roughly 7 digits of precision, if the bottom digits become meaningless due to rounding and computations, you are in some danger of seeing the effect of the errors in your results. REAL*8 with 13 digits makes this much less likely to happen.
Be aware of the relative magnitude of numbers when you are performing additions.
When summing up numbers, if there is a wide range, sum from smallest to largest.
Perform multiplications before divisions whenever possible.
When performing a comparison with a computed value, check to see if the values are “close” rather than identical.
Make sure that you are not performing any unnecessary type conversions during the critical portions of your code.
An excellent reference on floating-point issues and the IEEE format is “What Every Computer Scientist Should Know About Floating-Point Arithmetic,” written by David Goldberg, in ACM Computing Surveys magazine (March 1991). This article gives examples of the most common problems with floating-point and outlines the solutions. It also covers the IEEE floating-point format very thoroughly. I also recommend you consult Dr. William Kahan’s home page (http://www.cs.berkeley.edu/~wkahan/) for some excellent materials on the IEEE format and challenges using floating-point arithmetic. Dr. Kahan was one of the original designers of the Intel i8087 and the IEEE 754 floating-point format.
Exercises*
Exercise 1.22.1.
Run the following code to count the number of inverses that are not perfectly accurate:
REAL*4 X,Y,Z
INTEGER I
I = 0
DO X=1.0,1000.0,1.0
Y = 1.0 / X
Z = Y * X
IF ( Z .NE. 1.0 ) THEN
I = I + 1
ENDIF
ENDDO
PRINT *,’Found ’,I
END
Exercise 1.22.2.
Change the type of the variables to REAL*8 and repeat. Make sure to keep the optimization at a sufficiently low level (-00) to keep the compiler from eliminating the computations.
Exercise 1.22.3.
Write a program to determine the number of digits of precision for REAL*4 and REAL*8.
Exercise 1.22.4.
Write a program to demonstrate how summing an array forward to backward and backward to forward can yield a different result.
Exercise 1.22.5.
Assuming your compiler supports varying levels of IEEE compliance, take a significant computational code and test its overall performance under the various IEEE compliance options. Do the results of the program change?
[1] See The Section Called “Introduction”Chapter 15, Using Published Benchmarks, for details on the Linpack benchmark.
[2] Magnetic core memory is still used in applications where radiation “hardness” — resistance to changes caused by ionizing radiation — is important.
[3] The Section Called “Introduction” describes cache coherency in more detail.
[4] The term for this is demand paging.
[5] Text pages are identified by the disk device and block number from which they came.
[6] See the STREAM section in The Section Called “Improving Memory Performance”Chapter 15 for measures of memory bandwidth.
[7] By the way, most machines have uncached memory spaces for process synchronization and I/O device registers. However, memory references to these locations bypass the cache because of the address chosen, not necessarily because of the instruction chosen.
[8] In high performance computing we often simulate the real world, so it is somewhat ironic that we use simulated real numbers (floating-point) in those simulations of the real world.
[9] Interestingly, analog computers have an easier time representing real numbers. Imagine a “water- adding” analog computer which consists of two glasses of water and an empty glass. The amount of water in the two glasses are perfectly represented real numbers. By pouring the two glasses into a third, we are adding the two real numbers perfectly (unless we spill some), and we wind up with a real number amount of water in the third glass. The problem with analog computers