Managing NFS and NIS, 2nd Edition - Mike Eisler [257]
Partitioning a network and increasing the available bandwidth should ease the constraints imposed by the network, and spur an increase in NFS performance. However, the network itself is not always the sole or primary cause of poor performance. Server- and client-side tuning should be performed in concert with changes in network topology. Chapter 16 has already covered server-side tuning; Section 18.1 will cover the client-side tuning issues.
Chapter 18. Client-Side Performance Tuning
The performance measurement and tuning techniques we've discussed so far have only dealt with making the NFS server go faster. Part of tuning an NFS network is ensuring that clients are well-behaved so that they do not flood the servers with requests and upset any tuning you may have performed. Server performance is usually limited by disk or network bandwidth, but there is no throttle on the rate at which clients generate requests unless you put one in place. Add-on products, such as the Solaris Bandwidth Manager, allow you to specify the amount of network bandwidth on specified ports, enabling you to restrict the amount of network resources used by NFS on either the server or the client. In addition, if you cannot make your servers or network any faster, you have to tune the clients to handle the network "as is."
Slow server compensation
The RPC retransmission algorithm cannot distinguish between a slow server and a congested network. If a reply is not received from the server within the RPC timeout period, the request is retransmitted subject to the timeout and retransmission parameters for that mount point. It is immaterial to the RPC mechanism whether the original request is still enqueued on the server or if it was lost on the network. Excessive RPC retransmissions place an additional strain on the server, further degrading response time.
Identifying NFS retransmissions
Inspection of the load average and disk activity on the servers may indicate that the servers are heavily loaded and imposing the tightest constraint. The NFS client-side statistics provide the most concrete evidence that one or more slow servers are to blame:
% nfsstat -rc
Client rpc:
Connection-oriented:
calls badcalls badxids timeouts newcreds badverfs
1753584 1412 18 64 0 0
timers cantconn nomem interrupts
0 1317 0 18
Connectionless:
calls badcalls retrans badxids timeouts newcreds
12443 41 334 80 166 0
badverfs timers nomem cantsend
0 4321 0 206
The -rc option is given to nfsstat to look at the RPC statistics only, for client-side NFS operations. The call type demographics contained in the NFS-specific statistics are not of value in this analysis. The test for a slow server is having badxid and timeout of the same magnitude. In the previous example, badxid is nearly a third the value of timeout for connection-oriented RPC, and nearly half the value of timeout for connectionless RPC. Connection-oriented transports use a higher timeout than connectionless transports, therefore the number of timeouts will generally be less for connection-oriented transports. The high badxid count implies that requests are reaching the various NFS servers, but the servers are too loaded to send replies before the local host's RPC calls time out and are retransmitted. badxid is incremented each time a duplicate reply is received for a retransmitted request (an RPC request retains its XID through all retransmission cycles). In this case, the server is replying to all requests, including the retransmitted ones. The client is simply not patient