2/24/2008 9T6WP
BCM7405
Preliminary Hardware Data Module
Functional Description
06/29/07
Bro a d c o m C o rp o r a ti o n
Page 1-68
MIPS4380 Processor Core
Document
7405-1HDM00-R
Because the TLB is shared by the instructions and data translations, it is also called the Joint TLB, or JTLB. There are 4-
entry I-TLB and D-TLB served as caches of the JTLB for fast lookup of instruction and data address translation.
The TLB in MIPS32 is fully programmable; the architecture provides four TLB instructions and a set of CP0 registers for a
system program to manage and retrieve the contents of the entire TLB.
MMU determines the cache attribute of memory locations in a page (for TLB-based translation) and in a segment (for fixed
mapping). The cache attributes can be cacheable and noncacheable, and can be write-thru or write-back for cacheable
memory locations.
Each TP has its own I-TLB, D-TLB, JTLB, and the set of TLB-related CP0 registers.
S
YSTEM
C
ONTROL
C
OPROCESSOR
(CP0)
In the MIPS32 architecture, CP0 contains a set of registers and controls to manage and display the status of all the hardware
resources in the CPU. In particular, it is responsible for all exception detection and generation, the processor’s diagnostics
capability, operating mode selection (kernel versus user mode), processor identification, timer, and the enabling and
disabling of interrupts.
Configuration information such as cache size and set associativity, TLB sizes, and EJTAG debug features are provided in
the Configuration register(s) in CP0.
There are two sets of CP0 registers in a CMT CPU: local and shared CP0 registers. Each TP can access all the shared CP0
registers, but the set of local registers to a TP allows the TP to perform all the execution exceptions.
I
NSTRUCTION
C
ACHE
The CPU has an on-core instruction cache. The cache is virtually indexed and physically tagged; this can minimize the cache
latency by allowing the cache access and translation take place in parallel. The cache is 2-way set associative and has a
line size of 64 bytes. The LRU (least recently used) algorithm is used to replace a cache line by an incoming line.
The cache control supports cache locking, which allows critical code such as interrupt handler be locked in the cache on a
per-line basis. Entries can be marked as locked using the
CACHE fetch-and-lock instruction. A locked line cannot be
replaced by the LRU algorithm but it can be removed by the execution of a CACHE invalidation instruction.
The instruction cache is split into two to provide enough instruction bandwidth to feed each TP in the CPU.
D
ATA
C
ACHE
The CPU has an on-core data cache. The cache is virtually indexed and physically tagged; this can minimize the cache
latency by allowing the cache access and translation take place in parallel. The cache is 4-way set associative and has a
line size of 64 bytes. The LRU (least recently used) algorithm is used to replace a cache line by an incoming line.
The cache control supports cache locking, which allows critical data be locked in the cache on a per-line basis. Entries can
be marked as locked using the
CACHE fetch-and-lock instruction. A locked line cannot be replaced by the LRU algorithm
but it can be removed by the execution of a CACHE invalidation instruction.
The data cache is shared by the TPs in the CPU.