29 September 1997 – Subject To Change
Internal Architecture
2–5
21164PC Microarchitecture
Prefetching does not begin until there is a “true” miss. A true miss is a reference that
misses in the Icache and then also misses in the refill buffer. If an Icache miss results
in a refill buffer hit, prefetching is not started until all the data has been moved from
the refill buffer entry into the pipeline.
Each fill of the Icache by the refill buffer occurs when the instruction buffer stage in
the IDU pipeline requires a new INT16. The INT16 is written into the Icache and the
instruction buffer simultaneously. This can occur at a maximum rate of one Icache
fill per cycle. The actual rate depends on how frequently the instruction buffer stage
requires a new INT16, and on availability of data in the refill buffer.
Once an Icache miss occurs, the Icache enters fill mode. When the Icache is in fill
mode, the refill buffer is checked each cycle to see if it contains the next INT16
required by the instruction buffer.
When the required data is not available in the refill buffer (also a miss), the Icache is
checked for a hit while it awaits the arrival of the data from the Bcache or main
memory. The IDU sends a read request to the CBU by means of the MTU. The CBU
checks the Bcache, and if the request misses, the CBU drives a main memory
request.
If there is an Icache hit at this time, the Icache returns to access mode and the
prefetcher stops sending fetches to the MTU. When a new program counter (PC) is
loaded (that is, taken branches), the Icache returns to access mode until the first miss.
The refill buffer receives and holds instruction data from fetches initiated before the
Icache returned to access mode.
The Icache has a 64-byte block size, whereas the refill buffer is able to load the
Icache with only one INT16 (16 bytes) per cycle. Therefore, each Icache block has
four valid bits, one for each 16-byte subblock.
2.1.1.3 Branch Execution
When a branch or jump instruction is fetched from the Icache by the prefetcher, the
IDU needs one cycle to calculate the target PC before it is ready to fetch the target
instruction stream. In the second cycle after the fetch, the Icache is accessed at the
target address. Branch and PC prediction are necessary to predict and begin fetching
the target instruction stream before the branch or jump instruction is issued.
The Icache records the outcome of branch instructions in a 2048-entry, 2-bit per
entry branch history table. The table is indexed by the instruction’s virtual address
bits <13:03>. This information is used as the prediction for the next execution of the
branch instruction. The 2-bit history state is a saturating counter that increments on
taken branches and decrements on not-taken branches. The branch is predicted taken