G A L A X Y ® A U R O U R A C O N F I G U R A T I O N A N D S Y S T E M I N T E G R A T I O N G U I D E
116
Section 4 Troubleshooting Guide
4.10
Boot device problems
The boot device does have some mortality – even if it is a SATADOM. Aside
from an all-out failure, or power/cabling problems, something to watch out for
is what happens when the boot drive is full. If the drive ever becomes 100%
full, it will act is if it is read-only on bootup. This will cause a host of problems
after bootup. The easy way out from this point is to clear the logs (NumaRAID
and system).
4.11
Data Drive problems
Here is a list of errors we have experienced with data drives:
Drive won’t spin up (Could be drive firmware or bad drive or power/interface
problem).
Drive is clicking (Bad drive – indicates head alignment problem).
Drive spins up and down repeatedly (Indicates a failure of the drive tachometer
on the spindle motor).
Drive responds but won’t spin (Spindle motor failure).
SMART indicates a problem (Imminent failure of a drive component).
Slow drive (Could be start of head alignment problem).
Drive vibrating excessively (Spindle balance weight came off).
4.12
SAS HBA problems
The internal connections on the LSI or Supermicro SAS HBA can be damaged
– especially the shielding on the multilane SAS connector. As mentioned
before, if this shielding becomes bent, it may prevent the cable from locking in
properly. But note how this card interfaces with everything: There are 8 lanes
going from the PCIe slot on the motherboard into the SAS chip, and 8 lanes
coming out of the chip going to the cables. There are a number of components
on the board which can be damaged, which could cause a failure on a single
SAS lane. There are (among others), 9 LEDs on the board – one LED (usually
visible on the outside) is a heartbeat. This LED blinks to indicate that the
processor on the board is functioning. If the BIOS on the card gets screwed
up, it won’t blink. The 8 other LEDs show communication between the drives
and the card. If one doesn’t light, then chances are there is no communication
on that port. Rechecking cables first is always the best thing. One other note:
These cards typically use the LSI 1068e chip. This chip supports a maximum
of about 192 devices. However the switched backplanes from SuperMicro
don’t have the same number of devices as the backplane itself. Backplanes up
to 16 drives have a SAS chip which takes the space of 28 devices. The 24-
drive backplane has a SAS chip which takes the space of 64 devices, so
although the card supports 192 devices, using SuperMicro switched
backplanes, it can’t support more than (3) 24-drive backplanes or more than