High Performance Computing - Charles Severance [67]
[16] Just for the record, both the authors of this book are quite accomplished in C, C++, and FORTRAN, so they have no preconceived notions.
[17] By “definitions,” we mean the assignment of values: not declarations.
[18] More generally, code can be cast as n-tuples. It depends on the level of the intermediate language.
[19] See The Section Called “Assembly Language” for some examples of machine code translated directly from intermediate language.
[20] This code is an example of a flow dependence. I describe dependencies in detail in The Section Called “Introduction”.
[21] If a compiler does sufficient interprocedural analysis, it can even optimize variables across routine boundaries. Interprocedural analysis can be the bane of benchmark codes trying to time a computation without using the results of the computation.
[22] Time is money.
[23] Cache miss time is buried in here too.
[24] The uptime command gives you a rough indication of the other activity on your machine. The last three fields tell the average number of processes ready to run during the last 1, 5, and 15 minutes, respectively.
[25] The term “flat profile” is a little overloaded. We are using it to describe a profile that shows an even distribution of time throughout the program. You will also see the label flat profile used to draw distinction from a call graph profile, as described below.
[26] Remember: code with profiling enabled takes longer to run. You should recompile and relink the whole thing without the –p flag when you have finished profiling.
[27] It doesn’t have to be a tree. Any subroutine can have more than one parent. Furthermore, recursive subroutine calls introduce cycles into the graph, in which a child calls one of its parents.
[28] On HP machines, the flag is –G.
[29] In the interest of conserving space, we clipped out the section most relevant to our discussion and included it in this example. There was a lot more to it, including calls of setup and system routines, the likes of which you will see when you run gprof.
[30] You may have noticed that there are two main routines: _MAIN_ and _main. In a FORTRAN program, _MAIN_ is the actual FORTRAN main routine. It’s called as a subroutine by _main, provided from a system library at link time. When you’re profiling C code, you won’t see _MAIN_.
[31] A basic block is a section of code with only one entrance and one exit. If you know how many times the block was entered, you know how many times each of the statements in the block was executed, which gives you a line-by-line profile. The concept of a basic block is explained in detail in The Section Called “Introduction”
[32] On Sun Solaris systems, the –xa option is used.
[33] We examine the techniques for blocking in The Section Called “Virtual Memory” Chapter 8.
[34] Warning: The size command won’t give you the full picture if your program allocates memory dynamically, or keeps data on the stack. This area is especially important for C programs and FORTRAN programs that create large arrays that are not in COMMON.
[35] You could also reboot the machine! It will tell you how much memory is available when it comes up.
[36] By the way, are you getting the message “Out of memory?” If you are running csh, try typing unlimit to see if the message goes away. Otherwise, it may mean that you don’t have enough swap space available to run the job.
[37] The statement function has been eliminated in FORTRAN 90.
[38] Some programmers use the standard UNIX m4 preprocessor for FORTRAN
[39] The machine representation of a floating-point number starts with a sign bit. If the bit is 0, the number is positive. If it is 1, the number is negative. The fastest absolute value function is one that merely “ands” out the sign bit. See macros in /usr/include/macros.h and /usr/include/math.h.
[40] Nowadays, single-precision calculations can take longer than double-precision calculations from register to register.
[41] And because of overflow and round-off errors in floating-point, in some situations they might not be equivalent.