G A L A X Y ® A U R O U R A L S C O N F I G U R A T I O N A N D S Y S T E M I N T E G R A T I O N G U I D E
79
Section 4 Troubleshooting Guide
if there’s rust on the outside, electronic components on the inside could also be rusting – and
those can’t be cleaned with the Royal Jelly.
4.8
Motherboard problems
Connectors:
As with the plugs which plug into them, many connectors can be damaged –
especially SATA connectors on the motherboard. Here are the various connectors used and
considering which could be damaged: LED/switch/Chassis connections, IPMI socket, RAM
sockets, CPU sockets, PCI/PCIe slots, power connections, fan connections, SATA
connections, and I2C connections (to power supply or to LEDs).
i801:
The motherboards we’ve tested, have Intel i801 chips used for the sensors. While this is
a fairly reliable chip, the symptom you might see if it fails is that all of the sensors will go dead
simultaneously (Assuming there is no software problem), and/or the chip can’t be found by the
computer.
Northbridge:
The Northbridge controls higher-speed functions of the motherboard, such as
the on-board VGA (ATI ES1000 or Matrox G200) and RAM. If the on-board VGA dies, the unit
is still capable of being operated remotely, however the only fix is to replace the motherboard.
Note that on some motherboards, the Northbridge also controls the PCIe slots.
RAM:
RAM can fail. If the amount of memory is suddenly decreased, it could indicate a
problem with one or more of the memory modules. If the module is intermittent, try swapping
around the modules and see if the problem goes away. If the module failed completely, the
best way to troubleshoot it is to try swapping the modules one-at-a-time.
Southbridge:
This chip controls the slower-speed functions of the motherboard, such as
PCI/32, PCI/x, serial/parallel ports, power management, Ethernet, USB ports, and interfaces
with the real-time clock. Typically, if a Southbridge dies, then entire motherboard doesn’t
function.
CPU:
If you have a motherboard with multiple CPUs, if one CPU goes out, the system will
typically lock up until it is rebooted, at which point, only one CPU might come up. See also
fans, below.
Chassis/CPU/Chipset Fans:
It is important to keep an eye on the chassis fans, as they not
only cool the drives, but also play a part in cooling the motherboard, CPU, and RAM. There
also may be, depending on the motherboard, a fan on the Northbridge or Southbridge chip, as
well as a fan directly on the CPU. If a chassis fan fails, you should see it in the NumaRAID
GUI, however if a chipset or CPU fan fails, a typical symptom is spontaneous rebooting of the
array (Not related to software).
IPMI Card/On-Board:
Typically, either the IPMI card works or it doesn’t. If an IPMI card fails, it
will show a host of symptoms, such as not appearing in the BIOS, or it’s Ethernet port or
virtual disk not showing up in the OS. However, if the IPMI card is known to be good, and
works in another system, it could indicate a problem with the +5V Standby as going through
the motherboard, or coming from the power supply – in other words, a more serious problem.
CMOS Battery:
We do show the status of the CMOS battery from the motherboard in the
NumaRAID GUI. If the battery gets low (~6% of it’s normal voltage), you will start to see
symptoms of the battery failing, such as the date and time on the hardware clock are not
correct, and bootup messages saying the battery is low or dead. It is very simple to replace
and very low-cost. At the time of this writing, SuperMicro boards use CR-2032 3V batteries.
Do NOT substitute other models, such as CR-2025.