High Performance Computing - Charles Severance [42]
main () {
int l;
for (l=0;l<1000;l++) {
if (l == 2*(l/2)) foo ();
bar();
baz();
}
}
foo (){
int j;
for (j=0;j<200;j++)
}
bar () {
int i;
for (i=0;i<200;i++);
}
baz () {
int k;
for (k=0;k<300;k++);
}
Again, you need to compile and link loops with the –p flag, run the program, and then run the prof utility to extract a profile, as follows:
% cc loops.c -p -o loops
% ./loops
% prof loops > loops.prof
The following example shows what a loops.prof should look like. There are six columns.
%Time Seconds Cumsecs #Calls msec/call Name
56.8 0.50 0.50 1000 0.500 _baz
27.3 0.24 0.74 1000 0.240 _bar
15.9 0.14 0.88 500 0.28 _foo
0.0 0.00 0.88 1 0. _creat
0.0 0.00 0.88 2 0. _profil
0.0 0.00 0.88 1 0. _main
0.0 0.00 0.88 3 0. _getenv
0.0 0.00 0.88 1 0. _strcpy
0.0 0.00 0.88 1 0. _write
The columns can be described as follows:
%Time Percentage of CPU time consumed by this routine
Seconds CPU time consumed by this routine
Cumsecs A running total of time consumed by this and all preceding routines in the list
Calls The number of times this particular routine was called
msec/call Seconds divided by number of calls giving the average length of time taken by each invocation of the routine
Name The name of this routine
The top three routines listed are from loops.c itself. You can see an entry for the “main” routine more than halfway down the list. Depending on the vendor, the names of the routines may contain leading or trailing underscores, and there will always be some routines listed you don’t recognize. These are contributions from the C library and possibly the FORTRAN libraries, if you are using FORTRAN. Profiling also introduces some overhead into the run, and often shows up as one or two subroutines in the prof output. In this case, the entry for _profil represents code inserted by the linker for collecting runtime profiling data.
If it was our intention to tune loops, we would consider a profile like the one in the figure above to be a fairly good sign. The lead routine takes 50% of the runtime, so at least there is a chance we could do something with it that would have a significant impact on the overall runtime. (Of course with a program as trivial as loops, there is plenty we can do, since loops does nothing.)
gprof
Just as it’s important to know how time is distributed when your program runs, it’s also valuable to be able to tell who called who in the list of routines. Imagine, for instance, if something labeled _exp showed up high in the list in the prof output. You might say: “Hmmm, I don’t remember calling anything named exp(). I wonder where that came from.” A call tree helps you find it.
Subroutines and functions can be thought of as members of a family tree. The top of the tree, or root, is actually a routine that precedes the main routine you coded for the application. It calls your main routine, which in turn calls others, and so on, all the way down to the leaf nodes of the tree. This tree is properly known as a call graph.[27] The relationship between routines and nodes in the graph is one of parents and children. Nodes separated by more than one hop are referred to as ancestors and descendants.
Figure 6-4 graphically depicts the kind of call graph you might see in a small application. main is the parent or ancestor of most of the rest of the routines. G has two parents, E and C. Another routine, A, doesn’t appear to have any ancestors or descendants at all. This problem can happen when routines are not compiled with profiling enabled, or when they aren’t invoked with a subroutine call — such as would be the case if A were an exception handler.
The UNIX profiler that can extract this kind of information is called gprof. It replicates the abilities of prof, plus it gives a call graph profile so you can see who calls whom, and how often. The call graph profile is handy if you are trying to figure out how a piece of code works or where an unknown routine came from, or if you are looking for candidates for subroutine inlining.