156
IBM Power 750 and 760 Technical Overview and Introduction
4.2.4 Cache protection
processor-based systems are designed with cache protection mechanisms,
including cache line delete in both L2 and L3 arrays, processor instruction retry and alternate
processor recovery protection on L1-I and L1-D, and redundant “repair” bits in L1-I, L1-D, and
L2 caches, and also L2 and L3 directories.
L1 instruction and data array protection
The processor instruction and data caches are protected against intermittent
errors using processor instruction retry and against permanent errors by alternate processor
recovery, both mentioned previously. L1 cache is divided into sets. processor can
deallocate all but one before doing a processor instruction retry.
In addition, faults in the Segment Lookaside Buffer (SLB) array are recoverable by the
POWER Hypervisor. The SLB is used in the core to perform address translation calculations.
L2 and L3 array protection
The L2 and L3 caches in the processor are protected with double-bit detect
single-bit correct error detection code (ECC). Single-bit errors are corrected before forwarding
to the processor and are subsequently written back to the L2 and L3 cache.
In addition, the caches maintain a cache line delete capability. A threshold of correctable
errors detected on a cache line can result in the data in the cache line being purged and the
cache line removed from further operation without requiring a reboot. An ECC uncorrectable
error detected in the cache can also trigger a purge and delete of the cache line. This results
in no loss of operation because an unmodified copy of the data can be held on system
memory to reload the cache line from main memory. Modified data is handled through Special
Uncorrectable Error handling.
L2 and L3 deleted cache lines are marked for persistent deconfiguration on subsequent
system reboots until they can be replaced.
4.2.5 Special Uncorrectable Error handling
While it is rare, an uncorrectable data error can occur in memory or a cache. IBM POWER
processor-based systems attempt to limit the impact of an uncorrectable error to the least
possible disruption, using a well-defined strategy that first considers the data source.
Sometimes, an uncorrectable error is temporary in nature and occurs in data that can be
recovered from another repository, as in the following example:
Data in the instruction L1 cache is never modified within the cache itself. Therefore, an
uncorrectable error discovered in the cache is treated like an ordinary cache miss, and
correct data is loaded from the L2 cache.
The L2 and L3 cache of the processor-based systems can hold an unmodified
copy of data in a portion of main memory. In this case, an uncorrectable error simply
triggers a reload of a cache line from main memory.