Managing NFS and NIS, 2nd Edition - Mike Eisler [83]
Recall from Section 1.3.1 that packets larger than the medium's MTU must be fragmented. Fragmentation of output packets is easy, but the other direction, reassembly of input fragments, is harder if the fragments arrive out of order, or if a fragment is dropped or delayed. With larger NFS transfer sizes, the risk of a reassembly problem is higher, and if there is a problem, the entire datagram must be retransmitted, including all the fragments. NFS Version 2 was designed to be gentler to the network during the days when operating systems, routers, and network hardware were less capable. Nowadays, these components are much more effective, and so NFS Version 3 removes the artificial limits to transfer size.
NFS over TCP
Both NFS Version 2 and Version 3 operate over UDP and TCP. Since TCP is stateful, and NFS is stateless, it would seem to be a contradiction, if not an impossibility for NFS to operate over TCP. However, the layer between NFS and TCP is RPC, and RPC is implemented to hide state issues of TCP from NFS.
The first time an NFS client contacts a server over TCP, the RPC layer takes care of establishing a connection. If a server crashes, the client won't know that immediately, but the next time it sends a request over the connection, the connection will break due to a connection reset from the server, or a connection timeout. In either case, the RPC layer simply re-establishes a connection.
Some NFS/TCP implementations, such as that in Solaris, maintain a single connection between the NFS client and server, such that all traffic—for all users and mount points—is multiplexed between the client and server. Other implementations, such as those in the BSD releases, have one connection per mountpoint. Aside from a user-level NFS client like a web browser, or a Java application linked to NFS classes, you are not likely to encounter an NFS client that creates a connection per user.
If the client crashes, the server will periodically close connections that haven't been used in a while. On a Solaris NFS server, this connection idle timer defaults to six minutes.
* * *
[1] Not all implementations of NFS have this duplicate request cache. Current releases of Solaris, Compaq's Tru64 Unix, and other current operating systems implement the cache to improve the performance and "correctness" of NFS. A few, older implementations of NFS do not reject nonidempotent, duplicate requests. This produces some strange and often incorrect results when requests are retransmitted. An NFS client that sends the same remove operation to such a server may find that the designated file was removed, but the RPC call returns the "No such file or directory" error.
[2] Asking the mountd daemon isn't the only way to get the filehandle for a filesystem. Recall that Chapter 6 briefly mentioned the public option to the mount command. We will discuss this in more detail in Chapter 12.
NFS components
NFS is similar to other RPC services in its use of a server-side daemon (nfsd ) to process incoming requests. It differs from the typical client-server model in that processes on NFS clients make some RPC calls themselves, and other RPC calls are made by the clients' async threads. All of the NFS client and server code is contained in the kernel, instead of in the server daemon executable—a decision also driven by performance requirements.
nfsd and NFS server threads
With all of the NFS code in the kernel, why bother with user processes for the server? Why not make NFS a purely kernel-to-kernel service, without any user processes? On systems that have an nfsd daemon, nfsd does the following:
Initializes a transport endpoint to be used by the kernel to process NFS requests from. This involves allocating a transport endpoint on which to listen for requests, and then registering the endpoint with the portmapper (rpcbind ). It is much more convenient to do this from a user-level program than