![RadiSys ATCA-4616 Reference Download Page 59](http://html.mh-extra.com/html/radisys/atca-4616/atca-4616_reference_780512059.webp)
3
Software/Firmware Description
59
At
runtime,
the
CPU
triggers
a
System
Management
Interrupt
(SMI)
when
memory
errors
reach
a
preset
threshold.
If
the
runtime
error
logging
is
enabled.
then
SMI
determines
the
cause,
clears
the
error
status,
and
reports
the
memory
error
to
IPMC.
Memory
errors
can
be
either
correctable
or
uncorrectable.
If
the
count
of
correctable
memory
errors
goes
above
the
BIOS
"Max
Mem
Err
Events"
value,
the
SMI
handler
reports
that
the
correctable
error
limit
has
been
exceeded
and
disables
further
correctable
error
reporting
(thus
preventing
performance
degradation).
Uncorrectable
memory
errors
are
also
reported
to
IPMC,
but
error
handling
is
determined
by
BIOS
and
OS
settings.
PCIe error handling
The
BIOS
uses
both
legacy
PCI
error
signaling
(PERR/SERR)
and
PCI
Express
Advanced
Error
Reporting
(AER).
The
AER
mapping
reports
the
error
severity
(correctable,
uncorrectable/non
‐
fatal,
or
uncorrectable/fatal)
in
addition
to
reporting
the
error.
If
the
BIOS
has
been
set
up
to
enable
PCI
error
logging
support,
the
BIOS
enumerates
all
PCI
devices
detected
on
the
system
at
POST
time,
and
enables
the
error
reporting
–
PERR/SERR
for
legacy
devices
and
AER
reporting
if
the
device
supports
it.
The
BIOS
applies
an
error
mask
to
all
AER
‐
supported
devices
when
errors
are
reported,
and
may
trigger
critical
error
action
for
detected
AER
errors
of
the
proper
severity.
As
with
memory
errors,
at
runtime
PCI
errors
are
signaled
to
SMI.
The
PCI
device
causing
the
error
is
next
determined.
The
SMI
routine
then
clears
the
error
status
and
reports
a
platform
event
to
IPMC.
The
SMI
handler
may
then
trigger
critical
error
action
depending
on
BIOS
setup
options.
Processor and integrated controller error handling
The
CPUs
as
well
as
the
integrated
QuickPath
Interconnect
(QPI)
and
Integrated
I/O
(IIO)
controllers
implement
various
types
of
error
detection,
correction,
containment,
and
reporting
features.
Processor
core
and
uncore
error
reporting
is
performed
via
Machine
Check
Architecture
(MCA).
At
startup
or
after
a
power
‐
good
reset,
BIOS
initializes
the
machine
check
registers,
clears
the
status
registers
by
writing
zeros
into
the
registers,
and
writes
all
ones
into
the
control
registers
to
enable
all
MCA
features.
If
the
system
is
not
coming
up
from
a
power
‐
good
reset,
it
retains
any
error
information
by
preserving
the
content
of
machine
check
status
registers.
The
QPI
protocol
uses
a
CRC
mechanism
to
ensure
the
data
integrity
of
a
serial
stream.
Unless
a
“corrupt
data
containment”
mechanism
is
enabled,
the
processor
generates
a
QPI
error
signal
on
error
detection,
which
in
turn
generates
an
SMI
for
the
BIOS
to
report
a
platform
event.
The
IIO
module
uses
an
AER
mechanism,
similar
to
PCI
error
handling,
to
trigger
different
system
error
severity
responses
depending
on
the
type
of
detected
error.