Page | 1
Maintenance Best Practices for Adaptec RAID Solutions
Note
: This document is intended to provide insight into the best practices for routine maintenance of Adaptec
RAID systems. These maintenance best practices are recommended to all Adaptec RAID customers to help avoid
data loss, maintain data integrity and minimize downtime. The use of these practices will ensure a better
customer experience through maintaining the integrity of data and minimizing the costs of downtime.
It is important to understand both the benefits of RAID and its limitations. Many RAID users tend to be
complacent with maintenance and backups due to the common misconception that their data is foolproof and
invulnerable.
RAID is the most common method of data protection and most companies rely on the redundancy provided by
RAID at various levels to protect them from disk drive failures. RAID’s ability to protect data has become
increasingly challenging with the exponential increase in drive capacities and the increased use of less reliable
drives.
RAID cannot protect data against virus attack, human error, data deletion, or natural or unnatural disaster. RAID
cannot protect data beyond its advertised disk drive redundancy (for RAID-1, RAID-10, and RAID-5 one drive
failure, for RAID-6 two drive failures, for example). Adaptec Technical Support often sees cases where an array is
in a degraded state for a longer period of time and data loss then occurs when a further drive finally fails. The
best RAID controller cannot help in this situation. In addition to timely maintenance, periodic backup still
remains one of the most critical practices in data operations.
THE EFFECT OF MODERN LARGER DISK SIZES AND DRIVE QUALITY ISSUES ON RAID
Modern disk architectures have continued to evolve, as have other computer-related technologies. Disk drives
are orders of magnitude larger than they were when RAID was first introduced. As disk drives have gotten larger,
their reliability has not improved, and, more importantly, the bit error likelihood per drive has increased
proportionally with the larger media. These three factors—larger disks, unimproved reliability, and increased bit
errors with larger media—all have serious consequences for the ability of RAID to protect data. The risk of data
loss is further compounded when lower-cost SATA disks (desktop edition drives) are employed for workload-
appropriate applications.
Hard drive media defects and other drive quality issues have steadily improved over time, even as drive sizes
have grown substantially. However, hard drives are not expected to be totally free of flaws. In addition, normal
wear on a drive may result in an increase in media defects, or “grown defects,” over time. The data block
containing the defect becomes unusable and must be “remapped” to another location on the drive. If a bad
block is encountered during a normal write operation, the controller marks that block as bad and the block is
added to the “grown defects list” in the drive’s NVRAM. That write operation is not complete until the data is
properly written in a remapped location. When a bad block is encountered during a normal read operation, the
controller will reconstruct the missing data from parity operations and remap the data to the new location. A
condition known as a double fault (“bad stripe”) occurs when a RAID controller encounters a bad block on a
drive in a RAID volume and then encounters an additional bad block on another hard drive in the same data
stripe. This double fault scenario can also occur while rebuilding a degraded array, leaving the controller with
insufficient parity information to reconstruct the data stripe. The end result is a rebuild failure with the loss of
any data in that stripe, assuming the stripe is in the user data area.