MIPS MIPS32 74K Series Programming Manual Download Page 130

Page: 130 / 156

8.4 Performance counters

Programming the MIPS32® 74K™ Core Family, Revision 02.14

130

8.4.1 Reading the event table.

There are a lot of events you can count. It’s relatively cheap to wire another signal from the internals of the core into
a counter. It’s time consuming and expensive to formulate a signal which represents exactly what a software engineer
might want to count, and even more expensive to test it. Where the definitions in

Table 8.8

are clear and simple,

they’re usually exactly right. Where they seem more obscure, tread carefully, and don’t just blame the author of this
manual (though sometimes it is my fault!) When you use a counter, use it first on a piece of code where you know the
answer, and check you’re really counting what you think you are.

When reading the table:

•

IFU: is the “instruction fetch unit” of the CPU pipeline. We can’t describe some events without referring to the
inside of the CPU. You might like to look back at

Section 1.4 “A brief guide to the 74K‘ core implementation”

•

LDQ, FSB, WBB: CPU queues, described in

Section 3.3.1, "Read/write ordering and cache/memory data queues

in the 74K‘ core"

•

Instruction fetch events: these include events in the I-cache, ITLB and main TLB (JTLB, for “joint TLB”, since it
serves both I-fetches and data loads/stores). When you count these remember you are counting instructions at the
start of the pipeline — and there are many reasons why instructions are fetched but never executed (more pre-
cisely, they never graduate):

•

74K CPUs have a 128-bit wide interface to the I-cache and fetch four instructions at once, so you only get
one cache fetch for that group of four instructions. But even then, an unconditional branch which is not at the
end of a group of four instructions means the remaining instructions will not be used: you can’t just multiply
I-cache fetches by four...

•

When you get an exception all work started on instructions later in sequence than the exception victim is dis-
carded: those instructions have been fetched and counted.

•

The IFU's branch predictors cause it to fetch speculatively from a predicted branch target. When that turns
out to be wrong, those speculative instructions will be discarded.

If there's an exception-causing address error during I-fetch, that fetch won't be counted.

•

Exceptions in a branch delay slot: are handled by internally setting the exception-return register

EPC

to point to

the branch instruction. After the exception is handled and control returns, the branch instruction is re-executed:
all MIPS branch instructions are contrived so the re-execution does exactly the same thing as the first time. But
the branch instruction is “really” run twice, and any performance count will show that.

•

Bubble: is somewhat like a no-op, generated inside the execution unit. It travels down the pipeline like a real
instruction. When it reaches the pipeline position which is used by real instructions to access some resource, you
can be sure that resource will not be used for that cycle.

•

Issue pool: this document’s informal name for the heap of instructions which are candidates for issue. These
instructions are kept in two 6-entry hardware queues which the implementation documents call DDQ0 and
DDQ1 (for ALU and AGEN type instructions respectively).