Managing NFS and NIS, 2nd Edition - Mike Eisler [243]
For all practical purposes, the NFS Version 3 protocol removes any limitations on the size of the data block that can be transmitted, although the data block size may still be limited by the underlying transport. Most NFS Version 3 implementations use a 32 KB data block size. The larger NFS writes reduce protocol overhead and disk seek time, resulting in much higher sequential file access.
NFS/TCP versus NFS/UDP
TCP handles retransmissions and flow control for NFS, requiring only individual packets to be retransmitted in case of loss, and making NFS practical over lossy and wide area network practical. In contrast, UDP requires the whole NFS operation to be retransmitted if one or more packets is lost, making it impractical over lossy networks. TCP allows read and write operations to be increased from 8 KB to 32 KB. By default, Solaris clients will attempt to mount NFS filesystems using NFS Version 3 over TCP when supported by the server. Note that workloads that mainly access attributes or consist of short reads will benefit less from the larger transfer size, and as such you may want to reduce the default read size block by using the rsize=n option of the mount command. This is explored in more detail in Chapter 18.
Locating bottlenecks
Given all of the areas in which NFS can break down, it is hard to pick a starting point for performance analysis. Inspecting server behavior, for example, may not tell you anything if the network is overly congested or dropping packets. One approach is to start with a typical NFS client, and evaluate its view of the network's services. Tools that examine the local network interface, the network load perceived by the client, and NFS timeout and retransmission statistics indicate whether the bulk of your performance problems are due to the network or the NFS servers.
In this and the next two chapters, we look at performance problems from excessive server loading to network congestion, and offer suggestions for easing constraints at each of the problem areas outlined above. However, you may want to get a rough idea of whether your NFS servers or your network is the biggest contributor to performance problems before walking through all diagnostic steps. On a typical NFS client, use the nfsstat tool to compare the retransmission and duplicate reply rates:
% nfsstat -rc
Client rpc:
Connection oriented:
calls badcalls badxids timeouts newcreds badverfs
1753584 1412 18 64 0 0
timers cantconn nomem interrupts
0 1317 0 18
Connectionless:
calls badcalls retrans badxids timeouts newcreds
12443 41 334 80 166 0
badverfs timers nomem cantsend
0 4321 0 206
The timeout value indicates the number of NFS RPC calls that did not complete within the RPC timeout period. Divide timeout by calls to determine the retransmission rate for this client. We'll look at an equation for calculating the maximum allowable retransmission rate on each client in Section 18.1.3.
If the client-side RPC counts for timeout and badxid are close in value, the network is healthy. Requests are making it to the server but the server cannot handle them and generate replies before the client's RPC call times out. The server eventually works its way through the backlog of requests, generating duplicate replies that increment the badxid count. In this case, the emphasis should be on improving server response time.
Alternatively, nfsstat may show that timeout is large while badxid is zero or negligible.