Intel® Server Board S2600CW Family TPS
Intel® Server Board S2600CW Platform Management
Revision 2.4
73
5.3.11.2
Processor Population Fault (CPU Missing) Sensor
The BMC supports a
Processor Population Fault
sensor. This is used to monitor for the
condition in which processor slots are not populated as required by the platform HW to allow
power-on of the system.
At BMC startup, the BMC checks for the fault condition and sets the sensor state accordingly.
The BMC also checks for this fault condition at each attempt to DC power on the system. At
each DC power-on attempt, a beep code is generated if this fault is detected.
The following steps are used to correct the fault condition and clear the sensor state:
1.
AC power down the server.
2.
Install the missing processor into the correct slot.
3.
AC power on the server.
5.3.11.3
ERR2 Timeout Monitoring
The BMC supports an ERR2 Timeout Sensor (1 per CPU) that asserts if a CPU’s ERR2 signal has
been asserted for longer than a fixed time period (> 90 seconds). ERR[2] is a processor signal
that indicates when the IIO (Integrated IO module in the processor) has a fatal error which
could not be communicated to the core to trigger SMI. ERR[2] events are fatal error
conditions, where the BIOS and OS will attempt to gracefully handle error, but may not be
always do so reliably. A continuously asserted ERR2 signal is an indication that the BIOS
cannot service the condition that caused the error. This is usually because that condition
prevents the BIOS from running.
When an ERR2 timeout occurs, the BMC asserts/de-asserts the ERR2 Timeout Sensor, and logs
a SEL event for that sensor. The default behavior for BMC core firmware is to initiate a system
reset upon detection of an ERR2 timeout. The BIOS setup utility provides an option to disable
or enable system reset by the BMC for detection of this condition.
5.3.11.4
CATERR Sensor
The BMC supports a CATERR sensor for monitoring the system CATERR signal.
The CATERR signal is defined as having three states:
high (no event)
pulsed low (possibly fatal may be able to recover)
low (fatal)
All processors in a system have their CATERR pins tied together. The pin is used as a
communication path to signal a catastrophic system event to all CPUs. The BMC has direct
access to this aggregate CATERR signal.
The BMC only monitors for the “CATERR held low” condition. A pulsed low condition is
ignored by the BMC. If a CATERR-low condition is detected, the BMC logs an error message to