User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
gx_03.fm.(1.2)
March 27, 2006
Instruction-Cache and Data-Cache Operation
Page 121 of 377
3. Instruction-Cache and Data-Cache Operation
The 750GX microprocessor contains separate 32-KB, 8-way set-associative instruction and data caches to
allow the execution units and registers rapid access to instructions and data. This chapter describes the orga-
nization of the on-chip instruction and data caches, the modified, exclusive, invalid (MEI) cache-coherency
protocol, cache-control instructions, various cache operations, and the interaction between the caches, the
load/store unit (LSU), the instruction unit, and the bus interface unit (BIU).
Note that in this chapter, the term ‘multiprocessor’ is used in the context of maintaining cache coherency.
These multiprocessor devices could be actual processors or other devices that can access system memory,
maintain their own caches, and function as bus masters requiring cache coherency. If the L2 cache is
enabled, read Chapter 9, L2 Cache, on page 323 before reading this chapter.
The 750GX L1 cache implementation has the following characteristics.
• There are two separate 32-KB instruction and data caches (Harvard architecture).
• Both instruction and data caches are 8-way set-associative.
• The caches implement a pseudo least-recently-used (PLRU) replacement algorithm within each set.
• The cache directories are physically addressed. The physical (real) address tag is stored in the cache
directory.
• Both the instruction and data caches have 32-byte cache blocks. A cache block is the block of memory
that a coherency state describes, also referred to as a cache line.
• Two coherency state bits for each data-cache block allow encoding for three states:
– Exclusive Modified (M)
– Exclusive Unmodified (E)
– Invalid (I)
• A single coherency state bit for each instruction-cache block allows encoding for two possible states:
– Invalid (INV)
– Valid (VAL)
• Each cache can be invalidated or locked by setting the appropriate bits in the Hardware-Implementation-
Dependent Register 0 (HID0), a Special-Purpose Register (SPR) specific to the 750GX.
The 750GX supports a fully-coherent 4-GB physical memory address space. Bus snooping is used to drive
the MEI 3-state cache-coherency protocol that ensures the coherency of global memory with respect to the
processor’s data cache. The MEI protocol is described in Section 3.3.2 on page 126.
On a cache miss, the 750GX’s cache blocks are filled in four beats of 64 bits each. The burst fill is performed
as a critical-double-word-first operation. The critical double word is simultaneously written to the cache and
forwarded to the requesting unit, thus minimizing stalls due to cache fill latency. The data-cache line is first
loaded into a 32-byte reload buffer, and, when it is full, it is written into the data cache in one cycle. This mini-
mizes the contention between the load-store unit and the line reload function. See Figure 9-1 on page 327.
The instruction and data caches are integrated into the 750GX as shown in Figure 3-1.