Server crashes every 8-10 days with ServeRAID 8 controllers
5073003
in the
Search
field at http://www.ibm.com.)
Evaluating hard disk drive problems
Hard disk drives are designed to tolerate some types of errors and recover from
them. The implementation of the error recovery varies from drive vendor to drive
vendor; however, they must conform to various disk standards that are common in
the industry. Aside from a catastrophic failure, most hard disk drives can recover or
retry operations that have failed. An example of a recoverable disk error is a drive
medium error in which the data on the substrate of the disk becomes unreadable.
There are many reasons for this happening, and most modern hard disk drives
have extra known-good sectors that are used internally to spare out a bad sector
with a good sector. New bad sectors are added to an internal “grown defect” list.
If a drive spares out a bad sector, the data that is stored on that sector might or
might not survive; however, the ServeRAID controller firmware verifies or rewrites
good data as errors are detected. This and other types of disk issues are often
resolved with background data scrubbing operations.
The hard disk drive firmware controls the error recovery process of the drive.
Sometimes, error recovery is improved over the life of the drive, and sometimes,
older firmware can cause premature disk failures. This information is listed in the
change history of the hard disk drive firmware updates. If a drive is marked Defunct
by the ServeRAID controller and does not have the latest firmware, the drive might
not be irrevocably bad. The drive might have been marked Defunct because of
firmware error recovery issues, not because something is physically wrong with the
disk.
There are several methods that you can use to evaluate whether a disk is good or
bad. You can use some methods while the system is online, and other methods
require the system to be taken offline. For details about the tools and syntax for
commands, see the
IBM ServeRAID User Reference Guide
(SRAID.PDF) on the
IBM Web site. If a drive cannot complete these tests or fails, replace the drive
according to your warranty terms and conditions.
Table 3. ServeRAID hard disk drive maintenance and recovery methods and tools
Action
Method and tool used
Operation
state of the
system
Confidence
to fix
Controller-based
low-level format
for a single
drive
This action erases all data on the selected drive and reconditions the
disk to a healthy state. This format performs read and write
operations to the drive and is a good option to test and recover a
drive that is marked Defunct.
1. Turn on the system and press Ctrl+A when you are prompted to
access the Adaptec RAID Configuration Utility (ARC).
2. From the ARC menu, select
Disk Utilities
.
3. Select the hard disk drive.
4. Select
Format Disk
.
Attention:
Be sure to select the correct disk to format. Formatting
the wrong disk can result in data loss.
Server is
offline.
Very high;
replace the
hard disk
drive if it
fails.
Chapter 1. ServeRAID-8 series best practices and maintenance information
27
Содержание ServeRAID-8 Series
Страница 1: ...ServeRAID 8 Series Best Practices and Maintenance Information...
Страница 2: ......
Страница 3: ...ServeRAID 8 Series Best Practices and Maintenance Information...
Страница 6: ...Index 49 iv ServeRAID 8 Series Best Practices and Maintenance Information...
Страница 41: ...Chapter 1 ServeRAID 8 series best practices and maintenance information 35...
Страница 54: ...48 ServeRAID 8 Series Best Practices and Maintenance Information...
Страница 57: ......
Страница 58: ...Part Number 46M1375 Printed in USA 1P P N 46M1375...