Chapter 4. Continuous availability and manageability
183
4.6 Power-On Reset Engine
The chip includes a Power-On Reset Engine (PORE), a programmable hardware
sequencer responsible for restoring the state of a powered down processor core and L2
cache (deep sleep mode), or chiplet (winkle mode). When a processor core wakes up from
sleep or winkle, the PORE fetches code created by the POWER Hypervisor from a special
location in memory containing the instructions and data necessary to restore the processor
core to a functional state. This memory image includes all the necessary boot and runtime
configuration data that were applied to this processor core since power-on, including circuit
calibration and cache repair registers that are unique to each processor core. Effectively the
PORE performs a mini initial program load (IPL) of the processor core or chiplet, completing
the sequence of operations necessary to restart instruction execution, such as removing
electrical and logical fences and reinitializing the Digital PLL clock source.
Because of its special ability to perform clocks-off and clocks-on sequencing of the hardware,
the PORE can also be used for RAS purposes:
The service processor can use the PORE to concurrently apply an initialization update to
a processor core/chiplet by loading new initialization values into memory and then forcing
it to go in and out of winkle mode. This step happens, all without causing disruption to the
workloads or operating system (all occurring in a few milliseconds).
In the same fashion, PORE can initiate an L3 cache dynamic “bit-line” repair operation if
the POWER Hypervisor detects too many recoverable errors in the cache.
The PORE can be used to dynamically repair node-to-node fabric bit lanes in a
processor-based server by quickly suspending chip-chip traffic during run time,
reconfigure the interface to use a spare bit lane, then resuming traffic, all without causing
disruption to the operation of the server.
4.7 Operating system support for RAS features
Table 4-2 gives an overview of features for continuous availability that are supported by the
various operating systems running on power systems. In the table, the word “Most” means
most functions.
Table 4-2 Operating system support for RAS features
RAS feature
AIX
5.3
AIX
6.1
AIX
7.1
IBM i
RHEL
5.7
RHEL
6.3
SLES11
SP2
System deallocation of failing components
Processor Fabric Bus Protection
X
X
X
X
X
X
X
Dynamic Processor Deallocation
X
X
X
X
X
X
X
Dynamic Processor Sparing
X
X
X
X
X
X
X
Processor Instruction Retry
X
X
X
X
X
X
X
Alternate Processor Recovery
X
X
X
X
X
X
X
Partition Contained Checkstop
X
X
X
X
X
X
X
Persistent processor deallocation
X
X
X
X
X
X
X
GX++ bus persistent deallocation
X
X
X
X
-
-
X