Product Fault Management,
Continued
Determine if the threshold has been exceeded for various
errors (typically the threshold is exceeded if three errors
occur within a 10 minute interval).
If the threshold has been exceeded for a particular type of
cache error, mark a flag that signifies that this resource is to
be disabled (the cache will be disabled in most, but not all,
cases).
Update the SYSTAT software register with results of error
/fault handling.
For memory uncorrectable error correction code (ECC)
errors:
—
If machine check, mark page bad and attempt to replace
page.
—
Fill in MEMCON software register with memory
configuration and error status for use in FRU isolation.
For memory single-bit correctable ECC errors:
—
Fill in corrected read data (CRD) entry FOOTPRINT
with set, bank, and syndrome information for use in
FRU isolation.
—
Update the CRD entry for time, address range, and
count; fill the MEMCON software register with memory
configuration information.
—
Scrub memory location for first occurrence of error
within a particular footprint. If second or more
occurrence within a footprint, mark page bad in hopes
that page will be replaced later. Disable soft error
logging for 10 minutes if threshold is exceeded.
—
Signify that CRD buffer be logged for the following
events: system shutdown (operator shutdown or crash),
hard single-cell address within footprint, multiple
addresses within footprint, memory uncorrectable ECC
error, or CRD buffer full.
For ownership memory correctable ECC error, scrub location.
Log error.
Continued on next page
5–58