For example, in the following
sautil <device_file>
command output excerpt, spare disk
1I:1:10
is being substituted for failed disk
1I:1:11
, which is why the logical drive is in the
RECOVERING
state.
---- LOGICAL DRIVE SUMMARY ---------------------------------------------------
# RAID Size Status
0 1+0 34700 MB RECOVERING
---- SAS/SATA DEVICE SUMMARY -------------------------------------------------
Location Ct Enc Bay WWID Type Capacity Status
internal 1I 1 12 0x500000e01117c732 DISK 36.4 GB OK
N/A 1I 1 11 0x500000e01115c352 N/A N/A FAILED
internal 1I 1 10 0x5000c5000032b839 DISK 36.4 GB SPARE (activated)
internal 1I 1 9 0x5000c5000030b0c5 DISK 36.4 GB UNASSIGNED
internal 2I 1 16 0x500000e011213482 DISK 36.4 GB UNASSIGNED
internal 2I 1 15 0x5000c500002084c9 DISK 73.4 GB UNASSIGNED
internal 2I 1 14 0x5000c5000030b9c9 DISK 36.4 GB UNASSIGNED
internal 2I 1 13 0x500000e01118a7a2 DISK 36.4 GB UNASSIGNED
---- SAS/SATA ENCLOSURE SUMMARY ----------------------------------------------
Location Ct Enc Expander_count Bay_count SEP_count
internal 1I 1 0 4 1
internal 2I 1 0 4 1
---- LOGICAL DRIVE 0 ---------------------------------------------------------
Logical Drive Device File........... c5t0d0
Fault Tolerance Mode................ RAID 1+0 (Disk Mirroring)
Logical Drive Size.................. 34700 MB
Logical Drive Status................ OK
# of Participating Physical Disks... 2
Participating Physical Disk(s)...... Ct:Enc:Bay:WWID
1I:1:12:0x500000e01117c732
1I:1:11:0x500000e01115c352 <-- NOT RESPONDING
Participating Spare Disk(s)......... Ct:Enc:Bay:WWID
1I:1:10:0x5000c5000032b839 <-- activated for 1I:1:11:0x500000e01115c352
Stripe Size......................... 128 KB
Logical Drive Cache Status.......... cache enabled
Configuration Signature............. 0xA00148CC
Media Exchange Detected?............ no
For more information about the
sautil
command, see
“The sautil command” (page 60)
.
Compromised fault tolerance
Compromised fault tolerance commonly occurs when more physical disks have failed than the fault
tolerance method can support. When fault tolerance fails, the logical volume also fails and
unrecoverable disk error messages are returned to the host. Data loss is likely to occur.
For example, suppose one drive fails in an array configured with RAID 5 fault tolerance while
another drive in the same array is still being rebuilt. If the array has no online spare, the logical
drive fails.
Compromised fault tolerance can also be caused by non disk problems, such as temporary power
loss to a storage system or a faulty cable. In such cases, the physical disks do not need to be
replaced. However, data can still be lost, especially if the system is busy when the problem occurs.
Recovering from fault tolerance failures
When fault tolerance has been compromised, inserting replacement disks does not improve the
condition of the logical drive. Instead, if your screen displays unrecoverable error messages, follow
these steps to recover data:
1.
Power off the server, and then power it back on.
In some cases, a marginal drive will work long enough to enable you to make copies of
important files.
2.
Make copies of important data if possible.
104 Physical disk installation and replacement