High Performance Computing - Charles Severance [44]
descendants As with the self column, figures in the descendants column have different meanings for the routine, its parents, and children. For the routine itself, it shows the number of seconds spent in all of its descendants. For the routine’s parents, the descendants figure describes how much time spent in the routine can be traced back to calls by each parent. Looking at routine _c_ again, you can see that of its total time, 3.23 seconds, 1.62 seconds were attributable to each of its two parents, _a_ and _b_. For the children, the descendants column shows how much of the child’s time can be attributed to calls from this routine. The child may have accumulated more time overall, but the only time displayed is time associated with calls from this routine.
calls The calls column shows the number of times each routine was invoked, as well as the distribution of those calls associated with both parents and children. Starting with the routine itself, the figure in the calls column shows the total number of entries into the routine. In situations where the routine called itself, you will also see a +n immediately appended, showing that additional n calls were made recursively. Parent and child figures are expressed as ratios. For the parents, the ratio m/n says “of the n times the routine was called, m of those calls came from this parent.” For the child, it says “of the n times this child was called, m of those calls came from this routine.”
gprof’s Flat Profile
As we mentioned previously, gprof also produces a timing profile (also called a “flat” profile, just to confuse things) similar to the one produced by prof. A few of the fields are different from prof, and there is some extra information, so it will help if we explain it briefly. The following example shows the first few lines from a gprof flat profile for stuff. You will recognize the top three routines from the original program. The others are library functions included at link-time.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
39.9 3.23 3.23 2 1615.07 1615.07 _c_ [6]
39.9 6.46 3.23 1 3230.14 4845.20 _b_ [4]
20.0 8.08 1.62 1 1620.07 3235.14 _a_ [5]
0.1 8.09 0.01 3 3.33 3.33 _ioctl [9]
0.0 8.09 0.00 64 0.00 0.00 .rem [12]
0.0 8.09 0.00 64 0.00 0.00 _f_clos [177]
0.0 8.09 0.00 20 0.00 0.00 _sigblock [178]
... .... .... . . . ......
Here’s what each column means:
%time Again, we see a field that describes the runtime for each routine as a percent- age of the overall time taken by the program. As you might expect, all the entries in this column should total 100% (nearly).
cumulative seconds For any given routine, the column called “cumulative seconds” tallies a running sum of the time taken by all the preceding routines plus its own time. As you scan towards the bottom, the numbers asymptotically approach the total runtime for the program.
self seconds Each routine’s individual contribution to the runtime.
calls The number of times this particular routine was called.
self ms/call Seconds spent inside the routine, divided by the number of calls. This gives the average length of time taken by each invocation of the routine. The figure is presented in milliseconds.
total ms/call Seconds spent inside the routine plus its descendants, divided by the number of calls.
name The name of the routine. Notice that the cross-reference number appears here too.
Accumulating the Results of Several gprof Runs
It is possible to accumulate statistics from multiple runs so that you can get a picture of how a program is doing with a variety of data sets. For instance, say that you wanted to profile an application — call it bar — with three different sets of input data. You could perform the runs separately, saving the gmon.out files as you go, and then combine the results into a single profile at the end:
% f77 -pg bar.f -o bar
% bar < data1.input
% mv gmon.out gmon.1
% bar < data2.input