Chapter 4. Reliability, availability, and serviceability
153
Draft Document for Review October 14, 2014 10:19 am
5137ch04.fm
4.3.10 Memory protection
POWER8 processor-based systems have a three-part memory subsystem design. This
design consists of two memory controllers in each processor module, which communicate to
buffer modules on memory DIMMS through memory channels and access the DRAM
memory modules on DIMMs, as shown in Figure 4-2 on page 153.
Figure 4-2 Memory protection features
The memory buffer chip is made by the same 22 nm technology that is used to make the
POWER8 processor chip, and the memory buffer chip incorporates the same features in the
technology to avoid soft errors. It implements a try again for many internally detected faults.
This function complements a replay buffer in the memory controller in the processor, which
also handles internally detected soft errors.
The bus between a processor memory controller and a DIMM uses CRC error detection that
is coupled with the ability to try soft errors again. The bus features dynamic recalibration
capabilities plus a spare data lane that can be substituted for a failing bus lane through the
recalibration process.
The buffer module implements an integrated L4 cache using eDRAM technology (with soft
error hardening) and persistent error handling features.
The memory buffer on each DIMM has four ports for communicating with DRAM modules.
The 16 GB DIMM, for example, has one rank that is composed of four ports of x8 DRAM
modules, each port containing 10 DRAM modules.
L4 Ca che
M em or y
Bu ffer
L 4
Memory Controller
Supports 128 B yt e Cache Li ne
Hardened “S tacked” Lat ches for S oft Error Protecti on
And reply buf fer to retry aft er soft i nt ernal f ault s
Special Uncorrectable error handli ng for solid fault s
Memo ry Bus
CRC protecti on with recalibration and retry on error
Spare Data lane can be dynamically substi tuted for
failed one
Memory Buffer
Same t echnology as POWER8 Processor Chips
–
Hardened “Stacked” Latches for Sof t E rror Protection
Can retry af ter int ernal soft Errors
L4 Cache im plem ent ed in eDRA M
–
DED/ SEC ECC Code
–
Persist ent correctable error handling
16 GB DIMM
4 P orts of Mem ory
–
10 DRA Ms x8 DRAM modules attached t o each
port
–
8 M odules Needed For Data
–
1 Needed For Error Correction Coding
–
1 A ddit ional Spare
2 P orts are c omb ine d to fo rm a 1 28 bi t EC C
wo rd
–
8 Reads f ill a processor cache
Seco nd port ca n be use d to fill a seco nd
cach e line
–
(M uch like having 2 DI MMs under one Memory
buff er but housed in t he sam e physi cal DIMM )
L4 Ca che
M em or y
Bu ffer
L 4
1 Rank DIM M Supp or ting 2
12 8 Bit ECC wor d DRAM G ro ups
Memory Ctrl
POWER8 DC M w it h
8 Me mor y Buse s
Supp or ting 8 DIM M S
Note : Bits u sed fo r d ata nd fo r ECC ar e s pre ad acr oss
9 DRAM s to m aximi ze e rro r c orr ect ion cap ab ility
Memory Ctrl
M em ory Bus
DRAM Protection:
Can handle at least 2 bad x8 DRAM modules across two ports
comprising an ECC word
(3 if not al l 3 fai lures on the same port)
128 Byte Cache Line