2–12
Internal Architecture
29 September 1997 – Subject To Change
21164PC Microarchitecture
A load instruction that is issued one cycle after a store instruction in the pipeline cre-
ates a conflict if both the load and store operations access the same memory location.
(The store instruction has not yet updated the location when the load instruction
reads it.) This conflict is handled by forcing the load instruction to take a replay trap;
that is, the IDU flushes the pipeline and restarts execution from the load instruction.
By the time the load instruction arrives at the Dcache the second time, the conflicting
store instruction has written the Dcache and the load instruction is executed nor-
mally.
Replay traps can be avoided by scheduling the load instruction to issue three cycles
after the store instruction. If the load instruction is scheduled to issue two cycles after
the store instruction, then it will be issue-stalled for one cycle.
2.1.4.4 Write Buffer
The MTU contains a write buffer that has six 32-byte entries, each of which holds
the data from one or more store instructions that access the same 32-byte block in
memory until the data is written into the Bcache. The write buffer provides a finite,
high-bandwidth resource for receiving store data to minimize the number of CPU
stall cycles. The write buffer and associated WMB instruction are described in Sec-
tion 2.7.
2.1.5 Cache Control and Bus Interface Unit
The cache control and bus interface unit (CBU) processes all accesses sent by the
MTU and implements all memory-related external interface functions, particularly
the coherence protocol functions for write-back caching. It controls the board-level
backup cache (Bcache). The CBU handles all instruction and primary Dcache read
misses and performs the function of writing data from the write buffer into the
shared coherent memory subsystem. The CBU also controls the 128-bit bidirectional
data bus, address bus, and I/O control. Chapter 4 describes the external interface.
2.1.6 Cache Organization
The 21164PC has two onchip caches
−
a primary data cache (Dcache) and a primary
instruction cache (Icache). All memory cells in the onchip caches are fully static,
six-transistor, CMOS structures.
The 21164PC also provides control for the external cache (Bcache).