Managing NFS and NIS, 2nd Edition - Mike Eisler [86]
On systems that use page mapping (SunOS 4.x, System V Release 4, and Solaris), there is no buffer cache, so the notion of "filling a buffer" isn't quite as clear. Instead, the async threads are given file pages whenever a write operation crosses a page boundary. The async threads group consecutive pages together to form a single NFS buffer. This process is called dirty page clustering.
If no async threads are running, or if all of them are busy handling other RPC requests, then the client process performing the write( ) system call executes the RPC itself (as if there were no async threads at all). A process that is writing large numbers of file blocks enjoys the benefits of having multiple write RPC requests performed in parallel: one by each of the async threads and one that it does itself.
As shown in Figure 7-2, some of the advantages of asynchronous Unix write( ) operations are retained by this approach. Smaller write requests that do not force an RPC call return to the client right away.
Figure 7-2. NFS buffer writing
Doing the read-ahead and write-behind in NFS buffer-sized chunks imposes a logical block size on the NFS server, but again, the logical block size has nothing to do with the actual filesystem implementation on either the NFS client or server. We'll look at the buffering done by NFS clients when we discuss data caching and NFS write errors. The next section discusses the interaction of the async threads and Unix system calls in more detail.
* * *
Tip
The async threads exist in Solaris. Other NFS implementations use multiple block I/O daemons (biod daemons) to achieve the same result as async threads.
* * *
NFS kernel code
The functions performed by the parallel async threads and kernel server threads provide only part of the boost required to make NFS performance acceptable. The nfsd is a user-level process, but contains no code to process NFS requests. The nfsd issues a system call that gives the kernel a transport endpoint. All the code that sends NFS requests from the client and processes NFS requests on the server is in the kernel.
It is possible to put the NFS client and server code entirely in user processes. Unfortunately, making system calls is relatively expensive in terms of operating system overhead, and moving data to and from user space is also a drain on the system. Implementing NFS code outside the kernel, at the user level, would require every NFS RPC to go through a very convoluted sequence of kernel and user process transitions, moving data into and out of the kernel whenever it was received or sent by a machine.
The kernel implementation of the NFS RPC client and server code eliminates most copying except for the final move of data from the client's kernel back to the user process requesting it, and it eliminates extra transitions out of and into the kernel. To see how the NFS daemons, buffer (or page) cache, and system calls fit together, we'll trace a read( ) system call through the client and server kernels:
A user process calls read( ) on an NFS mounted file. The process has no way of determining where the file is, since its only pointer to the file is a Unix file descriptor.
The VFS maps the file descriptor to a vnode and calls the read operation for the vnode type. Since the VFS type is NFS, the system call invokes the NFS client read routine. In the process of mapping the type to NFS, the file descriptor is also mapped into a filehandle for use by NFS. Locally, the client has a virtual node (vnode) that locates this file in its filesystem. The vnode contains a pointer to more specific filesystem information: for a local file, it points to an inode, and for an NFS file, it points to a structure containing an NFS filehandle.
The client read routine checks the local buffer (or page) cache for the data. If it is present, the data is returned right away. It's possible that the data requested