High Performance Computing - Charles Severance [48]
0 0 0 840 21508 0 0 0 0 0 0 0 1 0 0 0 251 186 156 0 10 90
0 0 0 846 21460 0 0 0 0 0 0 0 2 0 0 0 248 149 152 1 9 89
0 0 0 918 21444 0 0 0 0 0 0 0 4 0 0 0 258 143 152 2 10 89
Lots of valuable information is produced. For our purposes, the important fields are avm or active virtual memory, the fre or free real memory, and the pi and po numbers showing paging activity. When the fre figure drops to near zero, and the po field shows a lot of activity, it’s an indication that the memory system is overworked.
On a SysV machine, paging activity can be seen with the sar command:
% sar -r 5 5
This command shows you the amount of free memory and swap space presently available. If the free memory figure is low, you can assume that your program is paging:
Sat Apr 18 20:42:19
[r] freemem freeswap
4032 82144
As we mentioned earlier, if you must run a job larger than the size of the memory on your machine, the same sort of advice that applied to conserving cache activity applies to paging activity.[36] Try to minimize the stride in your code, and where you can’t, blocking memory references helps a whole lot.
A note on memory performance monitoring tools: you should check with your workstation vendor to see what they have available beyond vmstat or sar. There may be much more sophisticated (and often graphical) tools that can help you understand how your program is using memory.
Closing Notes*
We have seen some of the tools for timing and profiling. Even though it seems like we covered a lot, there are other kinds of profiles we would like to be able to cover — cache miss measurements, runtime dependency analysis, flop measurements, and so on. These profiles are good when you are looking for particular anomalies, such as cache miss or floating-point pipeline utilization. Profilers for these quantities exist for some machines, but they aren’t widely distributed.
One thing to keep in mind: when you profile code you sometimes get a very limited view of the way a program is used. This is especially true if it can perform many types of analyses for many different sets of input data. Working with just one or two profiles can give you a distorted picture of how the code operates overall. Imagine the following scenario: someone invites you to take your very first ride in an automobile. You get in the passenger’s seat with a sketch pad and a pen, and record everything that happens. Your observations include some of the following:
The radio is always on.
The windshield wipers are never used.
The car moves only in a forward direction.
The danger is that, given this limited view of the way a car is operated, you might want to disconnect the radio’s on/off knob, remove the windshield wipers, and eliminate the reverse gear. This would come as a real surprise to the next person who tries to back the car out on a rainy day! The point is that unless you are careful to gather data for all kinds of uses, you may not really have a picture of how the program operates. A single profile is fine for tuning a benchmark, but you may miss some important details on a multipurpose application. Worse yet, if you optimize it for one case and cripple it for another, you may do far more harm than good.
Profiling, as we saw in this chapter, is pretty mechanical. Tuning requires insight. It’s only fair to warn you that it isn’t always rewarding. Sometimes you pour your soul into a clever modification that actually increases the runtime. Argh! What went wrong? You’ll need to depend on your profiling tools to answer that.
Exercises*
Exercise 2.15.1.
Profile the following program using gprof. Is there any way to tell how much of the time spent in routine c was due to recursive calls?
main()
{
int i, n=10;
for (i=0; i<1000; i++) {
c(n);
a(n);
}
}
c(n)
int n;
{
if (n > 0) {
a(n-1);
c(n-1);
}
}
a(n)
int n;
{
c(n);
}
Exercise 2.15.2.
Profile an engineering code (floating-point intensive) with full optimization on and off. How does the profile change? Can you explain