User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
gx_01.fm.(1.2)
March 27,2006
PowerPC 750GX Overview
Page 35 of 377
instruction-cache flash invalidate bit (HID0[ICFI]). The instruction cache can be locked by setting
HID0[ILOCK]. The instruction cache supports only the valid and invalid states, and requires software to main-
tain coherency if the underlying program changes.
The 750GX also implements a 64-entry (16-set, 4-way set-associative) branch target instruction cache
(BTIC). The BTIC is a cache of branch instructions that have been encountered in branch/loop code
sequences. If the target instruction is in the BTIC, it is fetched into the instruction queue a cycle sooner than it
can be made available from the instruction cache. Typically, the BTIC contains the first two instructions in the
target stream. The BTIC can be disabled and invalidated through software.
Coherency of the BTIC is transparent to the running software and is coupled with various functions in the
750GX processor. When the BTIC is enabled and loaded with instruction pairs to support zero-cycle delay on
branches taken, the table must be invalidated if the underlying program changes. (This is also true for the
instruction cache.) The BTIC is invalidated on an instruction-cache flash invalidate, an icbi or rfi instruction,
and any exception.
For more information and timing examples showing cache hit and cache miss latencies, see Section 6.3.2,
Instruction Fetch Timing, on page 216.
1.2.5 On-Chip Level 2 Cache Implementation
The L2 cache is a unified cache that receives memory requests from both the L1 instruction and data caches
independently. The L2 cache is implemented with an L2 Cache Control Register (L2CR), an on-chip, 4-way,
set-associative tag array, and with a 1-MB, integrated SRAM for data storage. The L2 cache normally oper-
ates in write-back mode and supports cache coherency through snooping. The access interface to the L2 is
64 bits for writes and requires four cycles to write a single cache block. The access interface to the L2 is 256
bits for reads and requires one cycle to read a single cache block. The L2 uses ECC on a double word,
corrects most single-bit errors, and detects the remaining single-bit errors and all double-bit errors. See
Figure 9-1, L2 Cache, on page 327.
The L2 cache is organized with 64-byte lines, which in turn are subdivided into 32-byte blocks, the unit at
which cache coherency is maintained. This reduces the size of the tag array, and one tag supports two cache
blocks. Each 32-byte cache block has its own valid and modified status bits. When a cache line is removed,
the contents of both blocks and the tag are removed from the L2 cache. The cache block is only written to
system memory if the modified bit is set.
Requests from the L1 cache generally result from instruction misses, data load or store misses, write-through
operations, or cache-management instructions. Misses from the L1 cache are looked up in the L2 tags and
serviced by the L2 cache if they hit; they are forwarded to the 60x bus interface if they miss.
The L2 cache can accept multiple, simultaneous accesses. However, they are serialized and processed one
per cycle. The L1 instruction cache can request an instruction at the same time that the L1 data cache
requests one load and two store operations. The L2 cache also services snoop requests from the bus. If there
are multiple pending requests to the L2 cache, snoop requests have highest priority. Load-and-store requests
from the L1 data cache have the next highest priority. The last priority consists of instruction fetch requests
from the L1 instruction cache.
1.2.6 System Interface/Bus Interface Unit (BIU)
The PowerPC 750GX uses a reduced system signal set, which eliminates some optional 60x bus protocol
pins. The system designer needs to make note of these differences.