Managing NFS and NIS, 2nd Edition - Mike Eisler [40]
If servers are also NIS clients, then having only one master and one slave server creates a window in which the entire network pauses if either server goes down. If the servers have bound to each other, and one crashes, the other server rebinds to itself after a short timeout. In the interim, however, the "live" server is probably not doing useful work because it's waiting for an NIS server to respond. Increasing the number of slave servers decreases the probability that a single server crash hangs other NIS servers and consequently hangs their bound clients. In addition, running more than two NIS servers prevents all NIS clients from rebinding to the same server when an NIS server becomes unavailable.
Trace of a key match
Now we've seen how all of the pieces of NIS work by themselves. In reality, of course, the clients and servers must work together with a well-defined sequence of events. To fit all of the client- and server-side functionality into a time-sequenced picture, here is a walk-through the getpwuid( ) library call. The interaction of library routines and NIS daemons is shown in Figure 3-2.
A user runs ls -l, and the ls process needs to find the username corresponding to the UID of each file's owner. In this case, ls -l calls getpwuid(11461) to find the password file entry — and therefore username — for UID 11461.
The local password file looks like this:root:passwd:0:1:Operator:/:/bin/csh
daemon:*:1:1::/:
sys:*:2:2::/:/bin/csh
bin:*:3:3::/bin:
uucp:*:4:8::/var/spool/uucppublic:
The local file is checked first, but there is no UID 11461 in it. However, /etc/nsswitch.conf has this entry:passwd: files nis
which effectively appends the entire NIS password map. getpwuid( ) decides it needs to go to NIS for the password file entry.
getpwuid( ) grabs the default domain name, and binds the current process to a server for this domain. The bind can be done explicitly by calling an NIS library routine, or it may be done implicitly when the first NIS lookup request is issued. In either case, ypbind provides a server binding for the named domain. If the default domain is used, ypbind returns the current binding after pinging the bound server. However, the calling process may have specified another domain, forcing ypbind to locate a server for it. The client may have bindings to several domains at any time, all of which are managed by the single ypbind process.
The client process calls the NIS lookup RPC with key=11461 and map=passwd.byuid. The request is bundled up and sent to the ypserv process on the bound server.
The server does a DBM key lookup and returns a password file entry, if one is found. The record is passed back to the getpwuid( ) routine, where it is returned to the calling application.
Figure 3-2. Trace of the getpwuid( ) library call
The server can return a number of errors on a lookup request. Obviously, the specified key might not exist in the DBM file, or the map file itself might not be present on the server. At a lower level, the RPC might generate an error if it times out before the server responds with an error or data; this would indicate that the server did not receive the request or could not process it quickly enough. Whenever an RPC call returns a timeout error, the low-level NIS RPC routine instructs ypbind to dissolve the process's binding for the domain.
NIS RPC calls continue trying the remote server after a timeout error. This happens transparently to the user-level application calling the NIS RPC routine; for example, ls has no idea that one of its calls to getpwuid( ) resulted in an RPC timeout. The ls command just patiently waits for the getpwuid( ) call to return, and the RPC