G A L A X Y ® A U R O U R A L S C O N F I G U R A T I O N A N D S Y S T E M I N T E G R A T I O N G U I D E
81
Section 4 Troubleshooting Guide
4.9
Drive Backplane problems
In general, there’s two kinds of drive backplanes we use. One is a discreet backplane – the
other is a SAS-switched backplane. Both types of backplanes have an SES2 enclosure
management chip, which operates the LEDs and controls and monitors voltages,
temperatures, and fans on the backplane itself. The way this chip connects to the host is
different, however. On the switched backplanes, the chip is connected to the switch, whereas
on non-switched backplanes, it connects to the hos via an I2C interface. On the switched
backplanes, the switch connects to the host via an I2C interface instead. How these
backplanes are constructed varies: Typically, the discreet backplane has SAS connectors on
the drives which go through the board (i.e. through hole), whereas on the switched backplane,
the drive connectors are surface-mounted. Roughhousing the drives (i.e. not inserting them
carefully) could damage the connectors. On the rear of the board, there are multilane
connectors or discreet SATA connectors – these are also potentially very delicate. On the
multilane connector, should the shield become bent, the cable may not seat properly, causing
bad connections. Also, the I2C connection is especially delicate. Finally, there is power: Most
of these boards have multiple power connections – this isn’t done just to have a place to put
the connectors – it’s done for distributing the power across the ports – this enables hot-
pluggability. If, for example, one power connection was used, then hot-plugging one drive
might cause other drives to momentarily spin down then back up.
4.10
Boot device problems
The boot device does have some mortality – even if it is a SATADOM. Aside from an all-out
failure, or power/cabling problems, something to watch out for is what happens when the boot
drive is full. If the drive ever becomes 100% full, it will act is if it is read-only on bootup. This
will cause a host of problems after bootup. The easy way out from this point is to clear the logs
(NumaRAID and system).
4.11
Data Drive problems
Here is a list of errors we have experienced with data drives:
Drive won’t spin up (Could be drive firmware or bad drive or power/interface problem).
Drive is clicking (Bad drive – indicates head alignment problem).
Drive spins up and down repeatedly (Indicates a failure of the drive tachometer on the spindle
motor).
Drive responds but won’t spin (Spindle motor failure).
SMART indicates a problem (Imminent failure of a drive component).
Slow drive (Could be start of head alignment problem).
Drive vibrating excessively (Spindle balance weight came off).
4.12
SAS HBA problems
The internal connections on the LSI or Supermicro SAS HBA can be damaged – especially the
shielding on the multilane SAS connector. As mentioned before, if this shielding becomes
bent, it may prevent the cable from locking in properly. But note how this card interfaces with
everything: There are 8 lanes going from the PCIe slot on the motherboard into the SAS chip,
and 8 lanes coming out of the chip going to the cables. There are a number of components on
the board which can be damaged, which could cause a failure on a single SAS lane. There are
(among others), 9 LEDs on the board – one LED (usually visible on the outside) is a heartbeat.
This LED blinks to indicate that the processor on the board is functioning. If the BIOS on the