Online Book Reader

Home Category

UNIX System Administration Handbook - Evi Nemeth [448]

By Root 2754 0
most important factor affecting a system’s overall performance. Given infinite amounts of all other resources or certain types of applications (e.g., numerical simulations), a faster CPU will make a dramatic difference. But in the everyday world, CPU speed is relatively unimportant.

The most common performance bottleneck on UNIX systems is actually disk bandwidth. Because hard disks are mechanical systems, it takes many milliseconds to locate a disk block, fetch its contents, and wake up the process that’s waiting for it. Delays of this magnitude overshadow every other source of performance degradation. Each disk access causes a stall worth millions of CPU instructions.

Because UNIX provides virtual memory, disk bandwidth and memory are directly related. On a loaded system with a limited amount of RAM, you often have to write a page to disk to obtain a fresh page of virtual memory. Unfortunately, this means that using memory is often just as expensive as using the disk. Swapping and paging caused by bloated software is performance enemy #1 on most workstations.

Network bandwidth resembles disk bandwidth in many ways, due to the latencies involved. However, networks are atypical in that they involve entire communities rather than individual computers. They are also susceptible to hardware problems and overloaded servers.

25.3 SYSTEM PERFORMANCE CHECKUP


Most performance analysis tools tell you what’s going on at a particular point in time. However, the number and character of loads will probably change throughout the day. Be sure to gather a cross-section of data before taking action. The best information on system performance often becomes clear only after a long period (a month or more) of data collection.

Analyzing CPU usage


You will probably want to gather three kinds of CPU data: overall utilization, load averages, and per-process CPU consumption. Overall utilization can help identify systems on which the CPU’s speed itself is the bottleneck. Load averages give you an impression of overall system performance. Per-process CPU consumption data can identify specific processes that are hogging resources.

You can obtain summary information with the vmstat command on most systems and also with sar -u on Solaris and HP-UX. Both commands take two arguments: the number of seconds to monitor the system for each line of output and the number of reports to provide. For example,

% sar -u 5 5

13:33:40 %usr %sys %wio %idle

13:33:45 4 58 27 11

13:33:50 7 83 9 0

13:33:55 9 77 13 0

13:34:00 2 25 3 71

13:34:05 0 0 0 100

Average 4 49 10 36

sar -u reports the percentage of the CPU’s time that was spent running user code (%usr), running kernel code (%sys), and idling. Idle time is charged to the %wio category if there are processes blocked on high-speed I/O (disk, usually) and to the %idle column if not.

vmstat prints a variety of information, with the CPU-related columns at the end:

% vmstat 5 5

procs page faults cpu

r b w re mf pi po fr de sr in sy cs us sy id

0 0 0 0 0 0 0 0 0 0 4 22 19 2 1 97

1 0 0 67 2 0 0 0 0 0 26 751 52 53 47 0

0 0 0 96 0 0 0 0 0 0 39 1330 42 22 71 7

0 0 0 16 0 0 0 0 0 0 84 1626 99 7 74 19

0 0 0 1 0 0 0 0 0 0 11 216 20 1 11 88

Some columns have been removed from this example. We will defer discussion of the paging-related columns until later in this chapter.

User, system, and idle time are shown in the us, sy, and id columns. CPU numbers that are heavy on user time generally indicate computation, and high system numbers indicate that processes are making a lot of system calls or performing I/O. (vmstat shows the number of system calls per second in the sy column under faults.) A rule of thumb that has served us well over the years and that applies to most systems is that the system should spend approximately 50% of its nonidle time in user space and 50% in system space; the overall idle percentage should be nonzero. The cs column shows context switches per interval, the number of times that the kernel changed which process was running. An extremely high cs value typically

Return Main Page Previous Page Next Page

®Online Book Reader