UNIX System Administration Handbook - Evi Nemeth [370]
20.1 TROUBLESHOOTING A NETWORK
Several good tools are available for debugging a network at the TCP/IP layer. Most give low-level information, so you must understand the main ideas of TCP/IP and routing in order to use the debugging tools.
On the other hand, network issues can also stem from problems with higher-level protocols such as DNS, NFS, and HTTP. You might want to read through Chapter 13, TCP/IP Networking, and Chapter 14, Routing, before tackling this chapter.
In this section, we start with some general troubleshooting strategy. We then cover several essential tools, including ping, traceroute, netstat, tcpdump, and snoop. We don’t discuss the arp command in this chapter, though it, too, is a useful debugging tool—see page 286 for more information.
When your network is broken, chances are that you’ll be in quite a rush to repair it. Stop right there! It’s important to take a moment and consider how to approach the problem before jumping into action. The biggest mistake you can make is to introduce poorly planned changes into an already failing network.
Before you attack your network, consider these principles:
• Make one change at a time, and test each change to make sure that it had the effect you intended.
• Document the situation as it was before you got involved, and document every change you make along the way.
• Start at one “end” of a system or network and work through the system’s critical components until you reach the problem. For example, you might start by looking at the network configuration on a client, work your way up to the physical connections, investigate the network hardware, and finally, check the server’s physical connections and software configuration.
• Communicate regularly. Most network problems involve or affect lots of different people: users, ISPs, system administrators, telco engineers, network administrators, etc. Clear, consistent communication will prevent you from hindering each other’s efforts to solve the problem.
• Work as a team. Years of experience show that people make fewer stupid mistakes if they have a peer helping out.
• Use the layers of the network to negotiate the problem. Start at the “top” or “bottom” and work your way through the protocol stack.
This last point deserves a bit more discussion. As described on page 265, the architecture of TCP/IP defines several layers of abstraction at which components of the network can function. For example, HTTP depends on TCP, TCP depends on IP, IP depends on the Ethernet protocol, and the Ethernet protocol depends on the integrity of the network cable. You can dramatically reduce the amount of time spent debugging a problem if you first figure out which layer is misbehaving.
Ask yourself questions like these as you work up (or down) the stack:
• Do you have physical connectivity and a link light?
• Is your interface configured properly?
• Is DNS configured properly?1
• Do your ARP tables show other hosts?
• Can you ping the localhost address (127.0.0.1)?
• Can you ping other local hosts by IP address?
• Can you ping other local hosts by hostname?
• Can you ping hosts on another network?
• Do high-level commands like telnet and ssh work?
Once you’ve identified where the problem lies, take a step back and consider the effect your subsequent tests and prospective fixes will have on other services and hosts.
20.2 PING: CHECK TO SEE IF A HOST IS ALIVE
The ping command is embarrassingly simple, but in many situations it is all you need. It sends an ICMP ECHO_REQUEST packet to a target host and waits to see if the host answers back. Despite its simplicity, ping is one of the workhorses of network debugging.
You can use ping to check the status of individual hosts and to test segments of the network. Routing tables, physical networks, and gateways are all involved in processing