UNIX System Administration Handbook - Evi Nemeth [465]
Web site hijacking is the latest craze in security break-ins. For the sysadmin at a web hosting company, a hijacking can be a serious event. Phone calls stream in from the customer, from the media, from the company VIPs who just saw the news of the hijacking on CNN. Who will take the calls? What should that person say? Who is in charge? What role does each person play? If you are in a high-visibility business, it’s definitely worth thinking through this type of scenario, coming up with some preplanned answers, and perhaps even having a practice session to work out the details.
Procedures for dealing with a security break-in are outlined in Chapter 21, Security, starting on page 680.
Disaster planning
Planning for a disaster is best accomplished before the disaster hits. An unfortunate disaster fact is that most disasters occur on managers’ laptops, and from the sysadmin’s point of view, they can yell the loudest. In this section we look at various kinds of disasters, the data you need to gracefully recover, and the important elements of a disaster plan.
There are several kinds of disasters:
• Security breaches (of which 60% originate from within the organization)
• Environmental problems: power spikes and outages, cooling failures, floods, hurricanes, earthquakes, meteors, alien invasions
• Human error: deleted or damaged files and databases, lost configuration information (Does your mirroring system respond so quickly that an error propagates everywhere before you realize what’s happened?)
• Spontaneous hardware meltdowns: dead servers, fried hard disks, malfunctioning networks
In all of these situations, you will need access to both on-line and off-line copies of essential information. The on-line copies should be kept on an independent machine if possible, one that has a fairly rich complement of tools, has key sysadmins’ environments, runs its own name server, has a complete local /etc/hosts file, has no file sharing dependencies, has a printer attached, etc. Here’s a list of handy data to keep on the backup machine and in printed form;
• An outline of the disaster procedure: people to call, when to call, what to say
• Service contract phone numbers and customer numbers
• Key local phone numbers: staff, police, fire, boss, employment agency
• Data on hardware and software configurations: partition tables, PC hardware settings, IRQs, DMAs, and the like
• Backup tapes3
and the backup schedule that produced them
• Network maps
• Software serial numbers, licensing data, and passwords
• Vendor contact info for that emergency disk you need immediately
An important but sometimes unspoken assumption made in most disaster plans is that administration staff will be available to deal with the situation. Unfortunately, people get sick, graduate, go on vacation, and leave for other jobs. It’s worth considering what you’d do if you needed extra emergency help. (Not having enough sysadmins around can sometimes constitute an emergency in its own right if your systems are fragile or your users unsophisticated.)
You might try forming a sort of NATO pact with a local consulting company or university that has shareable system administration talent. Of course, you must be willing to share back when your buddies have a problem. Most importantly, don’t operate close to the wire in your daily routine. Hire enough system administrators and don’t expect them to work 12-hour days.
Test your disaster recovery plan before you need to use it. If you amassed a lot of Y2K supplies, some items (such as flashlights) may still be useful for more generic disasters. We found a really neat kind of flashlight that plugs into a wall socket. While the power is on, it stays at full charge. When the power goes out, it lights up so you can find it in the dark.
Test your generators and UPSs. Verify that everything you care about is