High Performance Computing - Charles Severance [40]
IMPLICIT NONE
*
REAL WTIME,CTIME
COMMON/HPCTIMC/CBEGIN,WBEGIN
REAL*8 CBEGIN,CEND,WBEGIN,WEND
REAL ETIME,CSCRATCH(2)
*
CALL HPCWALL(WEND)
CEND=ETIME(CSCRATCH)
*
WTIME = WEND - WBEGIN
CTIME = CEND - CBEGIN
*
WBEGIN = WEND
CBEGIN = CEND
END
Using Timing Information
You can get a lot information from the timing facilities on a UNIX machine. Not only can you tell how long it takes to perform a given job, but you can also get hints about whether the machine is operating efficiently, or whether there is some other problem that needs to be factored in, such as inadequate memory.
Once the program is running with all anomalies explained away, you can record the time as a baseline. If you are tuning, the baseline will be a reference with which you can tell how much (or little) tuning has improved things. If you are benchmarking, you can use the baseline to judge how much overall incremental performance a new machine will give you. But remember to watch the other figures — paging, CPU utilization, etc. These may differ from machine to machine for reasons unrelated to raw CPU performance. You want to be sure you are getting the full picture.
Subroutine Profiling*
Sometimes you want more detail than the overall timing of the application. But you don’t have time to modify the code to insert several hundred etime calls into your code. Profiles are also very useful when you have been handed a strange 20,000-line application program and told to figure out how it works and then improve its performance.
Most compilers provide a facility to automatically insert timing calls into your code at the entry and exit of each routine at compile time. While your program runs, the entry and exit times are recorded and then dumped into a file. A separate utility summarizes the execution patterns and produces a report that shows the percentage of the time spent in each of your routines and the library routines.
The profile gives you a sense of the shape of the execution profile. That is, you can see that 10% of the time is spent in subroutine A, 5% in subroutine B, etc. Naturally, if you add all of the routines together they should account for 100% of the overall time spent. From these percentages you can construct a picture — a profile — of how execution is distributed when the program runs. Though not representative of any particular profiling tool, the histograms in Figure 2.4 and Figure 2.5 depict these percentages, sorted from left to right, with each vertical column representing a different routine. They help illustrate different profile shapes.
Figure 2.4. Sharp profile — dominated by routine 1
A sharp profile says that most of the time is spent in one or two procedures, and if you want to improve the program’s performance you should focus your efforts on tuning those procedures. A minor optimization in a heavily executed line of code can sometimes have a great effect on the overall runtime, given the right opportunity. A flat profile,[25] on the other hand, tells you that the runtime is spread across many routines, and effort spent optimizing any one or two will have little benefit in speeding up the program. Of course, there are also programs whose execution profile falls somewhere in the middle.
Figure 2.5. Flat profile — no routine predominates
We cannot predict with absolute certainty what you are likely to find when you profile your programs, but there are some general trends. For instance, engineering and scientific codes built around matrix solutions often exhibit very sharp profiles. The runtime is dominated by the work performed in a handful of routines. To tune the code, you need to focus your efforts on those routines to make them more efficient. It may involve restructuring loops to expose parallelism, providing hints to the compiler, or rearranging memory references. In any case, the challenge is tangible; you can see the problems you have to fix.
There are limits to how much tuning one or two routines will improve your runtime, of course. An often quoted rule of thumb is Amdahl’s Law, derived