Managing NFS and NIS, 2nd Edition - Mike Eisler [259]
After a major timeout, the message:
NFS server host not responding still trying
is printed on the client's console. If a reply is eventually received, the "not responding" message is followed with the message:
NFS server host ok
Hard-mounting a filesystem guarantees that the sequence of retransmissions continues until the server replies. After a major timeout on a hard-mounted filesystem, the initial timeout period is doubled, beginning a new major cycle. Hard mounts are the default option. For example, a filesystem mounted via:[1]
# mount -o proto=udp,retrans=3,timeo=10 wahoo:/export/home/wahoo /mnt
has the retransmission sequence shown in Table 18-1.
Table 18-1. NFS timeout sequence for NFS over UDP
Absolute Time
Current Timeout
New Timeout
Event
1.0
1.0
2.0
Minor
3.0
2.0
4.0
Minor
7.0
4.0
2.0
Major, double initial timeout
...NFS server wahoo not responding...
9.0
2.0
4.0
Minor
13.0
4.0
8.0
Minor
21.0
8.0
4.0
Major, double initial timeout
Timeout periods are not increased without bound, for instance, the timeout period never exceeds 20 seconds (timeo=200) for Solaris clients using UDP, and 60 seconds for Linux. The system may also impose a minimum timeout period in order to avoid retransmitting too aggressively. Because certain NFS operations take longer to complete than others, Solaris uses three different values for the minimum (and initial) timeout of the various NFS operations. NFS write operations typically take the longest, therefore a minimum timeout of 1,250 msecs is used. NFS read operations have a minimum timeout of 875 msecs, and operations that act on metadata (such as getattr, lookup, access, etc.) usually take the least time, therefore they have the smaller minimum timeout of 750 msecs.
To accommodate slower servers, increase the timeo parameter used in the automounter maps or /etc/vfstab. Increasing retrans for UDP increases the length of the major timeout period, but it does so at the expense of sending more requests to the NFS server. These duplicate requests further load the server, particularly when they require repeating disk operations. In many cases, the client receives a reply after sending the second or third retransmission, so doubling the initial timeout period eliminates about half of the NFS calls sent to the slow server. In general, increasing the NFS RPC timeout is more helpful than increasing the retransmission count for hard-mounted filesystems accessed over UDP. If the server does not respond to the first few RPC requests, it is likely it will not respond for a "long" time, compared to the RPC timeout period. It's best to let the client sit back, double its timeout period on major timeouts, and wait for the server to recover. Increasing the retransmission count simply increases the noise level on the network while the client is waiting for the server to respond.
Note that Solaris clients only use the timeo mount parameter as a starting value. The Solaris client constantly adjusts the actual timeout according to the smoothed average round-trip time experienced during NFS operations to the server. This allows the client to dynamically adjust the amount of time it is willing to wait for NFS responses given the recent past responsiveness of the NFS server.
Use the nfsstat -m command to review the kernel's observed response times over the UDP transport for all NFS mounts:
% nfsstat -m
/mnt from mahimahi:/export
Flags: vers=3,proto=udp,sec=sys,hard,intr,link,symlink,acl,rsize=32768,
wsize=32768,retrans=2,timeo=15
Attr cache: acregmin=3,acregmax=60,acdirmin=30,acdirmax=60
Lookups: srtt=13 (32ms), dev=6 (30ms), cur=4 (80ms)
Reads: srtt=24 (60ms), dev=14 (70ms), cur=10 (200ms)
Writes: srtt=46 (115ms), dev=27 (135ms), cur=19 (380ms)
All: srtt=20 (50ms), dev=11 (55ms), cur=8 (160ms)
The smoothed, average round-trip (srtt) times are reported in milliseconds, as well as the average deviation (dev) and the current "expected" response time (cur). The numbers