High Performance Computing - Charles Severance [17]
As CPU speed increases faster than memory speed, you will need the techniques in this book. Also, as you move into multiple processors, memory problems don’t get better; usually they get worse. With many hungry processors always ready for more data, a memory subsystem can become extremely strained.
With just a little skill, we can often restructure memory accesses so that they play to your memory system’s strengths instead of its weaknesses.
Exercises*
Exercise 1.9.1.
The following code segment traverses a pointer chain:
while ((p = (char *) *p) != NULL);
How will such a code interact with the cache if all the references fall within a small portion of memory? How will the code interact with the cache if references are stretched across many megabytes?
Exercise 1.9.2.
How would the code in Exercise 1.9.1. behave on a multibanked memory system that has no cache?
Exercise 1.9.3.
A long time ago, people regularly wrote self-modifying code — programs that wrote into instruction memory and changed their own behavior. What would be the implications of self-modifying code on a machine with a Harvard memory architecture?
Exercise 1.9.4.
Assume a memory architecture with an L1 cache speed of 10 ns, L2 speed of 30 ns, and memory speed of 200 ns. Compare the average memory system performance with (1) L1 80%, L2 10%, and memory 10%; and (2) L1 85% and memory 15%.
Exercise 1.9.5.
On a computer system, run loops that process arrays of varying length from 16 to 16 million:
ARRAY(I) = ARRAY(I) + 3
How does the number of additions per second change as the array length changes? Experiment with REAL*4, REAL*8, INTEGER*4, and INTEGER*8.
Which has more significant impact on performance: larger array elements or integer versus floating-point? Try this on a range of different computers.
Exercise 1.9.6.
Create a two-dimensional array of 1024×1024. Loop through the array with rows as the inner loop and then again with columns as the inner loop. Perform a simple operation on each element. Do the loops perform differently? Why? Experiment with different dimensions for the array and see the performance impact.
Exercise 1.9.7.
Write a program that repeatedly executes timed loops of different sizes to determine the cache size for your system.
1.2. Floating-Point Numbers
Introduction*
Often when we want to make a point that nothing is sacred, we say, “one plus one does not equal two.” This is designed to shock us and attack our fundamental assumptions about the nature of the universe. Well, in this chapter on floating- point numbers, we will learn that “ 0.1 + 0.1 does not always equal 0.2” when we use floating-point numbers for computations.
In this chapter we explore the limitations of floating-point numbers and how you as a programmer can write code to minimize the effect of these limitations. This chapter is just a brief introduction to a significant field of mathematics called numerical analysis.
Reality*
The real world is full of real numbers. Quantities such as distances, velocities, masses, angles, and other quantities are all real numbers.[8] A wonderful property of real numbers is that they have unlimited accuracy. For example, when considering the ratio of the circumference of a circle to its diameter, we arrive at a value of 3.141592.... The decimal value for pi does not terminate. Because real numbers have unlimited accuracy, even though we can’t write it down, pi is still a real number. Some real numbers are rational numbers because they can be represented as the ratio of two integers, such as 1/3. Not all real numbers are rational numbers.