Functional Architecture
Intel® Server Board SE7520AF2 TPS
46
Revision
1.2
Intel order number C77866-003
3.3.6.3
Retry on Uncorrectable Error
The Intel® E7520 MCH includes specialized hardware to resubmit a memory read request upon
detection of an uncorrectable error. When a demand fetch (as opposed to a scrub) of memory
encounters an uncorrectable error as determined by the enabled ECC algorithm, the memory
control hardware causes a (single) full resubmission of the entire cache line request from
memory to verify the existence of corrupt data. This feature is expected to greatly reduce or
eliminate the reporting of false or transient uncorrectable errors in the DRAM array.
Note:
Any given read request is retried only once on behalf of this error detection mechanism. If
the uncorrectable error is repeated, it is logged and escalated as directed by device
configuration. In the memory mirror mode, the retry on an uncorrectable error is issued to the
mirror copy of the target data, rather than back to the devices responsible for the initial error
detection. This has the added benefit of making uncorrectable errors in DRAM fully correctable
unless the same location in both primary and mirror happens to be corrupt. This RASUM feature
may be enabled and disabled via configuration.
3.3.6.4
Integrated Memory Initialization Engine
The Intel® E7520 MCH provides hardware-managed ECC auto-initialization of all populated
DRAM space under software control. After the internal configuration has been updated to reflect
the types and sizes of populated DIMM devices, the MCH traverses the populated address
space initializing all locations with good ECC. This speeds up the mandatory memory
initialization step and frees the processor to pursue other machine initialization and
configuration tasks.
Additional features have been added to the initialization engine to support high-speed
population and verification of a programmable memory range with one of four known data
patterns (0/F, A/5, 3/C, and 6/9). This function facilitates a limited, very high-speed memory test
and provides a BIOS-accessible memory zeroing capability for use by the operating system.
3.3.6.5
DIMM Sparing Function
To improve fault-tolerance, the Intel® E7520 MCH includes specialized hardware to support fail-
over to a spare DIMM device in case a primary DIMM exceeds a specified threshold of runtime
errors. One of the DIMMs installed per channel, greater than or equal in size than all installed, is
not used but is instead kept in reserve. If a primary DIMM experiences significant failures, the
failing DIMM and its corresponding partner in the other channel (if applicable), will, over time
have its data copied over to the spare DIMM(s) held in reserve. When the data has all been
copied, the reserve DIMM(s) is put into service and the failing DIMM is removed from service.
Only one sparing cycle is supported. If this feature is not enabled, then all DIMMs are visible in
normal address space.
Note:
DIMM Sparing requires that the spare DIMM be at least the size of the largest primary
DIMM in use.
Hardware additions for this feature include the implementation of tracking register per DIMM to
maintain a history of error occurrence, and a programmable register to hold the fail-over error
threshold level. The operational model is as follows: if the fail-over threshold register is set to a
non-zero value, the feature is enabled. If the count of errors on any DIMM exceeds the register
value, fail-over begins.