Managing NFS and NIS, 2nd Edition - Mike Eisler [190]
These problems affect NIS and diskless client booting; they are best sorted out by using rpcinfo to emulate an RPC call and by observing server responses. Networks with multiple, heterogeneous servers may produce multiple, conflicting responses to the same broadcast request. Debugging problems that arise from this behavior often require knowing the order in which the responses are received.
Here's an example: we'll perform a broadcast and then watch the order in which responses are received. When a diskless client boots, it may receive several replies to a request for boot parameters. The boot fails if the first reply contains incorrect or invalid boot parameter information. rpcinfo -b sends a broadcast request to the specified RPC program and version number. The RPC program can either be specified in numeric (100026) form, or in its name equivalent (bootparam):
% rpcinfo -b bootparam 1
fe80::a00:20ff:feb5:1fba.128.67 unknown
fe80::a00:20ff:feb9:2ad1.128.78 unknown
131.40.52.238.128.67 mora
131.40.52.81.128.68 kanawha
131.40.52.221.128.79 holydev
Next Broadcast
% rpcinfo -b bootparam 1
131.40.52.81.128.68 kanawha
fe80::a00:20ff:feb5:1fba.128.67 unknown
131.40.52.238.128.67 mora
fe80::a00:20ff:feb9:2ad1.128.78 unknown
131.40.52.221.128.79 holydev
Next Broadcast
In this example, a broadcast packet is sent to the boot parameter server (bootparam). rpcinfo obtains the RPC program number (100026) from /etc/rpc or the rpc.bynumber NIS map (depending on /etc/nsswitch.conf ). Any host that is running the boot parameter server replies to the broadcast with the standard null procedure "empty" reply. The universal address for the RPC service is printed by the requesting host in the order in which replies are received from these hosts (see the sidebar). After a short interval, another broadcast is sent.
* * *
Universal addresses
A universal address identifies the location of a transport endpoint. For UDP and TCP, it is composed of the dotted IP address with the port number of the service appended. In this example, the host kanawha has a universal address of 131.40.52.81.128.68.
The first four elements in the dotted string form the IP address of the server kanawha:
% ypmatch 131.40.52.81
hosts.byaddr
131.40.52.81 kanawha
The last two elements, "128.68", are the high and low octets of the port on which the service is registered (32836). This number is obtained by multiplying the high octet value by 2^8 and adding it to the low octet value:
128 * 2^8 = 32768 (high
octet)
+ 68 (low octet)
-----
32836 (decimal representation of port)
rpcinfo helps us verify that bootparam is indeed registered on port 32836:
% rpcinfo -p kanawha | grep
bootparam
100026 1 udp 32836 bootparam
* * *
Server loading may cause the order of replies between successive broadcasts to vary significantly. A busy server takes longer to schedule the RPC server and process the request. Differing reply sequences from RPC servers are not themselves indicative of a problem, if the servers all return the correct information. If one or more servers has incorrect information, though, you will see irregular failures. A machine returning correct information may not always be the first to deliver a response to a client broadcast, so sometimes the client gets the wrong response.
In the last example (diskless client booting), a client that gets the wrong response won't boot. The boot failures may be very intermittent due to variations in server loading: when the server returning an invalid reply is heavily loaded, the client will boot without problem. However, when the servers with the correct information are loaded, then the client gets an invalid set of boot parameters and cannot start booting a kernel.
Binding to the wrong NIS server causes another kind of problem. A renegade NIS server may be the first to answer a ypbind broadcast for NIS service, and its lack of information about the domain makes the client machine unusable. Sometimes, just looking at the list of servers that respond to a request may flag