Running Linux, 5th Edition - Matthias Kalle Dalheimer [402]
As we can see, 45.11% of the total running time was spent in the function GetComponent--which amounts to 27.49 seconds. But is this because GetComponent is horribly inefficient[*] or because GetComponent itself called many other slow functions? The functions Push and Pop were called many times during execution: could they be the culprits?
The second part of the gprof report can help us here. It gives a detailed "call graph" describing which functions called other functions and how many times they were called. For example:
index % time self children called name
[1] 92.7 0.00 47.30 start [1] 0.01 47.29 1/1 main [2] 0.00 0.00 1/2 on_exit [53] 0.00 0.00 1/1 exit [172] The first column of the call graph is the index: a unique number given to every function, allowing you to find other functions in the graph. Here, the first function, start, is called implicitly when the program begins. start required 92.7% of the total running time (47.30 seconds), including its children, but required very little time to run itself. This is because start is the parent of all other functions in the program, including main; it makes sense that start plus its children requires that percentage of time. The call graph normally displays the children as well as the parents of each function in the graph. Here, we can see that start called the functions main, on_exit, and exit (listed below the line for start). However, there are no parents (normally listed above start); instead, we see the ominous word Skipping down to the entry for GetComponent, the function under suspicion, we see the following: index % time self children called name 0.67 0.23 1/41 GetFirstComponent [12] 26.82 9.30 40/41 GetNextComponent [5] [4] 72.6 27.49 9.54 41 GetComponent [4] 6.54 0.00 1811863/1811863 Push [7] 3.00 0.00 1811863/1811863 TrimNeighbors [9] 0.00 0.00 1/1 InitStack [54] The parent functions of GetComponent were GetFirstComponent and GetNextComponent, and its children were Push, TrimNeighbors, and InitStack. As we can see, GetComponent was called 41 times—one time from GetFirstComponent and 40 times from GetNextComponent. The gprof output contains notes that describe the report in more detail. GetComponent itself requires more than 27.49 seconds to run; only 9.54 seconds are spent executing the children of GetComponent (including the many calls to Push and TrimNeighbors). So it looks as though GetComponent and possibly its parent GetNextComponent need some tuning; the oft-called Push function is not the sole cause of the problem. gprof also keeps track of recursive calls and "cycles" of called functions and indicates the amount of time required for each call. Of course, using gprof effectively requires that all code to be profiled is compiled with the -pg option. It also requires a knowledge of the program you're attempting to profile; gprof can tell you only so much about what's going on. It's up to the programmer to optimize inefficient code. One last note about gprof: running it on a program that calls only a few functions—and runs very quickly—may not give you meaningful results. The units used for timing execution are usually rather coarse—maybe one one-hundredth of a second—and if many functions in your program run more quickly than that, gprof will be unable to distinguish between their respective running times (rounding them to the nearest hundredth of a second). In order to get good profiling information, you may need to run your program under unusual circumstances—for example,