Managing NFS and NIS, 2nd Edition - Mike Eisler [273]
dupreqs > 0
The duplicate request cache keeps a record of previously executed NFS requests. The dupchecks counter tracks the number of times this cache was consulted, or checked. The dupreqs counter tracks the number of times a check of the cache had a "hit." In other words, dupreqs counts the number of times the NFS server received a previously executed request. For connection-oriented (TCP) requests, a high ratio of dupreqs to dupchecks is 0.01%. For connectionless (UDP) requests, a high ratio of dupreqs to dupchecks is one percent. High ratios indicate one of three problems:
The timeout set on one or more clients' NFS mounts is too low. Adjust the timeo option in the automounter map or the NFS mount command upward.
The server is not responding quickly enough. There could be lots of reasons for this having to do with physical capabilities of the server: processor speed, numbers of processors (if it is a multiprocessor), not enough primary memory (check if the percentage of reads is high, say over 5%; this would indicate lots of reads that would be best served from cache if there was enough memory), numbers of disk drives on the system (spreading more data accesses across more spindles reduces response time; if you've eliminated primary memory as a cause, check if the percentage of writes is high, say over 5%), etc. Other possibilities extend to artificial limits, such as the number of server threads set via nfsd.
There is a routing problem impeding replies from the server to one or more clients.
readlink > 10%
Clients are making excessive use of symbolic links that are on filesystems exported by the server. If the link is to a directory, replace the symbolic link with a directory, and mount both the underlying filesystem and the link's target on the client. If the link is to a file, replace the symbolic link with a hard link.
getattr > 60%
Check for possible non-default attribute cache values on NFS clients. A very high percentage of getattr requests may indicate that the attribute cache window has been reduced or set to zero with the actimeo or noac mount option. It can also indicate that the NFS filesystem implementation is doing a poor job of attribute caching.
null > 1%
The automounter has been configured to mount replicated filesystems, but the timeout values for the mount are too short. The null procedure calls are made by the automounter to locate a server for the filesystem; too many null calls indicates that the automounter is retrying the mount frequently. Increase the mount timeout parameter on the automounter command line.
fsinfo > 1%
This is typically used only on mounts. Lots of fsinfo calls suggests that the automounter is frequently mounting and unmounting the same filesystems. If so, tune the automounter to hold mounts longer via the -t option to automount. This will improve the response time on clients.
Keep in mind that the percentages of each operation type used are only general rules of thumb. Your site may have legitimate reasons for percentages that go outside the rule of thumb.
NFS client problems
Using the output of nfsstat -c, look for the following symptoms:
timeout > 5%
The client's RPC requests are timing out before the server can answer them, or the requests are not reaching the server. Check badxids to determine the cause of the timeouts.
badxids ~ timeout
RPC requests that have been retransmitted are being handled by the server, and the client is receiving duplicate replies. Increase the timeo parameter for this NFS mount to alleviate the request retransmission, or tune the server to reduce the average request service time.
badxids ~ 0
With a large timeout count, this indicates that the network is dropping parts of NFS requests or replies in between the NFS client and server. Reduce the NFS buffer size using the rsize and wsize mount parameters to increase the probability that NFS