Managing NFS and NIS, 2nd Edition - Mike Eisler [242]
Network bandwidth
An overly congested network slows down both client transmissions and server replies. Network partitioning hardware installed to reduce network saturation adds delays to roundtrip times, increasing the effective time required to complete an RPC call. If the delays caused by network congestion are serious, they contribute to RPC timeouts. We explore network bottlenecks in detail in Chapter 17.
Server network interface
A busy server may be so flooded with packets that it cannot receive all of them, or it cannot queue the incoming requests in a protocol-specific structure once the network interface receives the packet. Interrupt handling limitations can also impact the ability of the server to pull packets in from the network.
Server CPU loading
NFS is rarely CPU-constrained. Once a server has an NFS request, it has to schedule an nfsd thread to have the appropriate operation performed. If the server has adequate CPU cycles, then the CPU does not affect server performance. However, if the server has few free CPU cycles, then scheduling latencies may limit NFS performance; conversely a system that is providing its maximum NFS service will not make a good CPU server. CPU loading also affects NIS performance, since a heavily loaded system is slower to perform NIS map lookups in response to client requests.
Server memory usage
NFS performance is somewhat related to the size of the server's memory, if the server is doing nothing but NFS. NFS will use either the local disk buffer cache (in systems that do not have a page-mapped VM system) or free memory to cache disk pages that have recently been read from disk. Running large processes on an NFS server hurts NFS performance. As a server runs out of memory and begins paging, its performance as either an NIS or NFS server suffers. Disk bandwidth is wasted in a system that is paging local applications, consumed by page fault handling rather than NFS requests.
Server disk bandwidth
This area is the most common bottleneck: the server simply cannot get data to or from the disks quickly enough. NFS requests tend to be random in nature, exhibiting little locality of reference for a particular disk. Many clients mounting filesystems from a server increase the degree of randomness in the system. Furthermore, NFS is stateless, so NFS Version 2 write operations on the server must be committed to disk before the client is notified that the RPC call completed. This synchronous nature of NFS write operations further impairs performance, since caching and disk controller ordering will not be utilized to their fullest extent. NFS Version 3 eases this constraint with the use of safe asynchronous writes, which are described in detail in the next section.
Configuration effects
Loosely grouped in this category are constrictive server kernel configurations, poor disk balancing, and inefficient mount point naming schemes. With poor configurations, all services operate properly but inefficiently.
Throughput
The next two sections summarize NFS throughput issues.
NFS writes (NFS Version 2 versus NFS Version 3)
Write operations over NFS Version 2 are synchronous, forcing servers to flush data to disk[3] before a reply to the NFS client can be generated. This severely limits the speed at which synchronous write requests can be generated by the NFS client, since it has to wait for acknowledgment from the server before it can generate the next request. NFS Version 3 overcomes this limitation by introducing a two-phased commit write operation. The NFS Version 3 client generates asynchronous write requests, allowing the server to acknowledge the requests without requiring it to flush the data to disk. This results in a reduction of the round-trip time between the client and server, allowing requests to be sent more quickly. Since the server no longer flushes the data to disk before it replies, the data may be lost if the server crashes or reboots unexpectedly. The NFS Version 3 client assumes the responsibility