Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

High Performance Computing - Charles Severance [43]

By Root 1271 0

To use call graph profiling you need go through the same steps as with prof, except that a –pg flag is substituted for the –p flag.[28] Additionally, when it comes time to produce the actual profile, you use the gprof utility instead of prof. One other difference is that the name of the statistics file is gmon.out instead of mon.out:

% cc -pg stuff.c -c

% cc stuff.o -pg -o stuff

% stuff

% gprof stuff > stuff.gprof

Figure 2.6. Simple call graph

The output from gprof is divided into three sections:

Call graph profile

Timing profile

Index

The first section textually maps out the call graph. The second section lists routines, the percentage of time devoted to each, the number of calls, etc. (similar to prof ). The third section is a cross reference so that you can locate routines by number, rather than by name. This section is especially useful for large applications because routines are sorted based on the amount of time they use, and it can be difficult to locate a particular routine by scanning for its name. Let’s invent another trivial application to illustrate how gprof works. Figure 2.7 shows a short piece of FORTRAN code, along with a diagram of how the routines are connected together. Subroutines A and B are both called by MAIN, and, in turn, each calls C. The following example shows a section of the output from gprof ’s call graph profile:[29]

Figure 2.7. FORTRAN example

called/total parents

index %time self descendants called+self name index

called/total children

.... .... ....

0.00 8.08 1/1 _main [2]

[3] 99.9 0.00 8.08 1 _MAIN_ [3]

3.23 1.62 1/1 _b_ [4]

1.62 1.62 1/1 _a_ [5]

-----------------------------------------------

3.23 1.62 1/1 _MAIN_ [3]

[4] 59.9 3.23 1.62 1 _b_ [4]

1.62 0.00 1/2 _c_ [6]

-----------------------------------------------

1.62 1.62 1/1 _MAIN_ [3]

[5] 40.0 1.62 1.62 1 _a_ [5]

1.62 0.00 1/2 _c_ [6]

-----------------------------------------------

1.62 0.00 1/2 _a_ [5]

1.62 0.00 1/2 _b_ [4]

[6] 39.9 3.23 0.00 2 _c_ [6]

Sandwiched between each set of dashed lines is information describing a given routine and its relationship to parents and children. It is easy to tell which routine the block represents because the name is shifted farther to the left than the others. Parents are listed above, children below. As with prof, underscores are tacked onto the labels.[30] A description of each of the columns follows:

index You will notice that each routine name is associated with a number in brackets ([n]). This is a cross-reference for locating the routine elsewhere in the profile. If, for example, you were looking at the block describing _MAIN_ and wanted to know more about one of its children, say _a_, you could find it by scanning down the left side of the page for its index, [5].

%time The meaning of the %time field is a little different than it was for prof. In this case it describes the percentage of time spent in this routine plus the time spent in all of its children. It gives you a quick way to determine where the busiest sections of the call graph can be found.

self Listed in seconds, the self column has different meanings for parents, the routine in question, and its children. Starting with the middle entry — the routine itself — the self figure shows how much overall time was dedicated to the routine. In the case _b_, for instance, this amounts to 3.23 seconds. Each self column entry shows the amount of time that can be attributed to calls from the parents. If you look at routine _c_, for example, you will see that it consumed a total time of 3.23 seconds. But note that it had two parents: 1.62 seconds of the time was attributable to calls from _a_, and 1.62 seconds to _b_. For the children, the self figure shows how much time was spent executing each child due to calls from this routine. The children may have consumed more time overall, but the only time accounted for is time-attributable to calls from this routine. For example, _c_ accumulated 3.23 seconds overall, but if you look at the block describing

Online Book Reader

High Performance Computing - Charles Severance [43]

®Online Book Reader