UNIX System Administration Handbook - Evi Nemeth [454]
• A dump of the inode table (-i)
• A dump of the text table (-x)
• A dump of the process table, gorier than ps (-P)
• A dump of the open file table (-f)
• Status information for all terminals (-t)
• Information about a particular process (-u)
• Information about swap space usage (-s)
• Information about how full the kernel’s tables are (-T)
pstat -T is useful for determining the optimal value of the maxusers variable when you are configuring the kernel. Unfortunately, pstat -T only shows you a small fraction of the things that are affected by maxusers, so you must still allow a generous margin of safety in your configuration. See Chapter 12, Drivers and the Kernel, for more information.
25.4 HELP! MY SYSTEM JUST GOT REALLY SLOW!
In previous sections, we’ve talked mostly about issues that relate to the average performance of a system. Solutions to these long-term concerns generally take the form of configuration adjustments or upgrades.
However, you will find that even properly configured systems are sometimes more sluggish than usual. Luckily, transient problems are often easy to diagnose. Ninety percent of the time, they are caused by a greedy process that is simply consuming so much CPU power or disk bandwidth that other processes have been stalled.
You can often tell which resource is being hogged without even running a diagnostic command. If the system feels “sticky” or you hear the disk going crazy, the problem is most likely a disk bandwidth or memory shortfall.4
If the system feels “sluggish” (everything takes a long time, and applications can’t be “warmed up”), the problem may be CPU.
The first step in diagnosis is to run ps or top to look for obvious runaway processes. Any process that’s using more than 50% of the CPU is likely to be at fault. If no single
process is getting an inordinate share of the CPU, check to see how many processes are getting at least 10%. If there are more than two or three (don’t count ps itself), the load average is likely to be quite high. This is, in itself, a cause of poor performance. Check the load average with uptime, and use vmstat or sar -u to check whether the CPU is ever idle.
If no CPU contention is evident, check to see how much paging is going on with vmstat or sar -g. All disk activity is interesting: a lot of page-outs may indicate contention for memory, while disk traffic in the absence of paging may mean that a process is monopolizing the disk by constantly reading or writing files.
There’s no direct way to tie disk operations to processes, but ps can narrow down the possible suspects for you. Any process that is generating disk traffic must be using some amount of CPU time. You can usually make an educated guess about which of the active processes is the true culprit.5
Use kill -STOP to test your theory.
Suppose you do find that a particular process is at fault—what should you do? Usually, nothing. Some operations just require a lot of resources and are bound to slow down the system. It doesn’t necessarily mean that they’re illegitimate. It is usually acceptable to renice an obtrusive process that is CPU-bound. But be sure to ask the owner to use the nice command in the future.
Processes that are disk or memory hogs can’t be dealt with so easily. renice generally will not help. You do have the option of killing or stopping a process, but we recommend against this if the situation does not constitute an emergency. As with CPU pigs, you can use the low-tech solution of asking the owner to run the process later.
Some systems allow a process’s consumption of physical memory to be restricted with the setrlimit system call. This facility is available in the C shell with the built-in limit command. For example, the command
%