Online Book Reader

Home Category

Managing RAID on Linux - Derek Vadala [121]

By Root 1352 0
systems, it is advisable to break out information reported by the kern facility into files of varying priority. This makes it easy to monitor the system for serious problems, while maintaining verbose information to retroactively diagnose persistent and unclear problems.

In general, search log files for the phrase " md:" to get a list of all software RAID-related messages:

# grep " md:" /var/log/kernel

You can also use mdadm's monitor mode, combined with the logger utility, to dump messages that are generated specifically by mdadm into the system logs:

# mdadm --monitor --program='logger -p kern.crit -t md: $*'

The logger program creates system log entries. In this example, I report any message that mdadm generates using the kern facility at the crit priority. The -t option adds a bit of informational text to each entry (in this case, md:). You can also put the command used by the --program option in /etc/mdadm.conf. In addition, mdadm reads its configuration file for a list of devices to monitor.

It's a good idea to run mdadm detached and in the background, as I described in Chapter 4. Remember that mdadm will report only limited information about critical problems. You should configure syslogd to capture md driver messages, even if you are using mdadm in Monitor mode.

BigBrother

Users of the popular monitoring tool BigBrother can use the bb-mdstat.sh script to monitor software arrays. Download the script from http://www.deadcat.net/cgi-bin/download.pl?section=1&file=bb-mdstat.sh.

SysOrb

SysOrb is a commercial system monitoring package developed by Evalesco Systems. It has complete support for Linux software RAID monitoring. The lead architect of SysOrb is Jakob Oestergaard, author of the Linux RAID HOWTO. You can demo SysOrb at http://www.evalesco.com.

Verbose SCSI Reporting

It might also be helpful to enable additional error reporting for low-level SCSI hardware. This is helpful for diagnosing SCSI problems that might affect array performance and stability.

When building your kernel, just turn on the Verbose SCSI Error Reporting (CONFIG_SCSI_CONSTANT) feature in the SCSI section.

SCSI support --->

...

[*] Verbose SCSI error reporting (kernel size +=12K)

...

Now SCSI messages that appear in the system logs will be more human-readable. For example:

Jun 27 18:15:53 apathy kernel: SCSI disk error : host 1 channel 0 id 2 lun 0 return

code = 10000

Jun 27 18:15:53 apathy kernel: I/O error: dev 08:61, sector 0

Managing Disk Failures

When a member disk of a RAID-1, RAID-4, or RAID-5 fails, the array enters into degraded mode. Degraded mode means that both performance and redundancy are impacted. RAID-0 and linear mode never enter into degraded mode because they do not support redundancy. If a disk in either a RAID-0 or linear mode configuration fails, the array stops. Unless the disk can be repaired, data will be lost.

RAID-1 can withstand at least a single disk failure. For a RAID-1 of n member disks, n-1 disks can fail before service is interrupted. When all disks in a RAID-1 fail, the array is no longer functional. In addition, parallel read performance of RAID-1 is affected by disk failures. For example, a RAID-1 consisting of three disks can potentially achieve parallel reads of up to three times the throughput of a single member disk. If a single disk fails, parallel read performance is reduced by a factor of one. An interesting side effect of disk failures under RAID-1 is that write performance will actually improve during degraded operation. That's because the number of writes that occurs is multiplied by the number of member disks in the array. As a RAID-1 loses member disks, the number of writes per I/O operation decreases.

RAID-4 and RAID-5 deal with disk failures in the same way. They can each survive only a single disk failure. Disk failures in RAID-4 and RAID-5 considerably impact array performance. Each time data is read from the array, the system must perform parity reconstruction to access data from the missing disk. When working with software RAID, this means that a larger

Return Main Page Previous Page Next Page

®Online Book Reader