IBM ServeRAID-8 Series Скачать руководство пользователя страница 33

Страница: 33 / 58

Server crashes every 8-10 days with ServeRAID 8 controllers

https://www-304.ibm.com/systems/support/supportsite.wss/
docdisplay?lndocid=MIGR-5073003&brandind=5000008
(Type

5073003

in the

Search

field at http://www.ibm.com.)

Evaluating hard disk drive problems

Hard disk drives are designed to tolerate some types of errors and recover from
them. The implementation of the error recovery varies from drive vendor to drive
vendor; however, they must conform to various disk standards that are common in
the industry. Aside from a catastrophic failure, most hard disk drives can recover or
retry operations that have failed. An example of a recoverable disk error is a drive
medium error in which the data on the substrate of the disk becomes unreadable.
There are many reasons for this happening, and most modern hard disk drives
have extra known-good sectors that are used internally to spare out a bad sector
with a good sector. New bad sectors are added to an internal “grown defect” list.

If a drive spares out a bad sector, the data that is stored on that sector might or
might not survive; however, the ServeRAID controller firmware verifies or rewrites
good data as errors are detected. This and other types of disk issues are often
resolved with background data scrubbing operations.

The hard disk drive firmware controls the error recovery process of the drive.
Sometimes, error recovery is improved over the life of the drive, and sometimes,
older firmware can cause premature disk failures. This information is listed in the
change history of the hard disk drive firmware updates. If a drive is marked Defunct
by the ServeRAID controller and does not have the latest firmware, the drive might
not be irrevocably bad. The drive might have been marked Defunct because of
firmware error recovery issues, not because something is physically wrong with the
disk.

There are several methods that you can use to evaluate whether a disk is good or
bad. You can use some methods while the system is online, and other methods
require the system to be taken offline. For details about the tools and syntax for
commands, see the

IBM ServeRAID User Reference Guide

(SRAID.PDF) on the

IBM Web site. If a drive cannot complete these tests or fails, replace the drive
according to your warranty terms and conditions.

Table 3. ServeRAID hard disk drive maintenance and recovery methods and tools

Action

Method and tool used

Operation
state of the
system

Confidence
to fix

Controller-based
low-level format
for a single
drive

This action erases all data on the selected drive and reconditions the
disk to a healthy state. This format performs read and write
operations to the drive and is a good option to test and recover a
drive that is marked Defunct.

1. Turn on the system and press Ctrl+A when you are prompted to

access the Adaptec RAID Configuration Utility (ARC).

2. From the ARC menu, select

Disk Utilities

3. Select the hard disk drive.

4. Select

Format Disk

Attention:

Be sure to select the correct disk to format. Formatting

the wrong disk can result in data loss.

Server is
offline.

Very high;
replace the
hard disk
drive if it
fails.