1.4 A brief guide to the 74K
™
core implementation
Programming the MIPS32® 74K™ Core Family, Revision 02.14
16
There are a few simple instructions where the ALU produces its results in one clock (they’re listed in
but most ALU instructions require two clocks: so, in the 74K core, dependent ALU instructions cannot usually
be run back-to-back. This would have a catastrophic effect on the performance of an in-order CPU, because
many instructions are dependent on their immediate predecessor. But an out-of-order CPU will run just fine,
because there are also a reasonable number of cases where an instruction is not dependent on its immediate pre-
decessor, so the pipeline can find something to run. The CPU will slow down if fed with a sequence of relatively
long-latency instructions each of which is dependent on its predecessor, of course. For example, in the AGEN
pipeline it takes four cycles to turn a load address into load data (assuming a cache hit). So chasing a chain of
pointers through memory will take at least four cycles per pointer.
•
Optimistic issue: any instruction which is issued may yet not run to completion (there might be an exception on
an earlier-in-program instruction, for example). But some instructions are issued even though they are directly
dependent on something we’re not sure about — they’re issued optimistically. The most common example is that
instructions dependent on load data are issued as if we were confident the load will hit in the L1 cache.
Sometimes it turns out we were wrong. Notably, sometimes the load we’re dependent on suffers a cache miss. In
this case the hardware does the simplest thing: rather than attempt to single out the now unviable instruction, we
take a redirect on the load-value-consuming instruction we issued optimistically — that is, we discard all work
on that instruction and its successors, and ask the front end of the pipeline to start again from scratch, re-fetching
the instruction from the I-cache.
•
In-order completion: at the end of the execution unit we take the oldest in-flight instruction (with luck, the sec-
ond-oldest too) and, if it’s results are ready, we graduate
3
one or two instructions (“GRU” stands for “graduation
unit”). Before we do that, we make a last minute check for exceptions: if one of the proposed graduates has
encountered a condition which should cause an exception it will be carrying that information with it, we discard
that instruction and do a redirect to the start of the appropriate exception handler. On successful graduation the
instruction’s results are copied from its CB entry back to a real CPU register, and it’s finished.
Because instruction effects aren’t “publicly” visible until graduation, our out-of-order CPU appears to the pro-
grammer to be running sequentially just like any other MIPS32-compliant CPU.
More details about out-of-order execution
That’s the basic flow. But the dual-issue, out-of-order design has some subtle points which can affect how programs
run:
•
Mispredicted branches and redirects: because of the long pipeline, the 74K core relies very heavily on good
branch prediction. When the IFU guesses wrong about a conditional branch, or can’t compute the target for a
jump-register instruction, that’s detected somewhere down the AGEN pipeline (usually the “EC” stage). By then
we’ll have done a minimum of 12 cycles of work on the wrong path.
Whenever a branch is resolved the prediction result is sent back to the IFU to maintain its history table. For most
branches, the prediction result is sent back at the same time as we resolve the branch, which means that a few
branches which don’t graduate can affect the branch history. That’s OK, it was only a heuristic.
•
Exceptions: can’t be resolved until we’re committed to running an instruction and have completed all its prede-
cessors. So they’re resolved only at graduation. That posts an exception handler address down to the front of a
pipe, clearing out all prefetched and speculatively-executed instructions in the process. There will be at least 19
3.
Curiously, the alternative word to “graduation” (for an instruction being committed in an out-of-order design) is “retirement”:
a rather different stage of one’s career. I guess that from a software point of view we’re glad that the instruction is now grown
up and real, while the hardware is now ready to wave goodbye to it.
Содержание MIPS32 74K Series
Страница 1: ...Document Number MD00541 Revision 02 14 March 30 2011 Programming the MIPS32 74K Core Family...
Страница 10: ...Programming the MIPS32 74K Core Family Revision 02 14 10...
Страница 20: ...1 4 A brief guide to the 74K core implementation Programming the MIPS32 74K Core Family Revision 02 14 20...
Страница 28: ...2 2 PRId register identifying your CPU type Programming the MIPS32 74K Core Family Revision 02 14 28...
Страница 54: ...3 8 The TLB and translation Programming the MIPS32 74K Core Family Revision 02 14 54...
Страница 83: ......
Страница 86: ...6 5 FPU pipeline and instruction timing Programming the MIPS32 74K Core Family Revision 02 14 86...
Страница 101: ...The MIPS32 DSP ASE 101 Programming the MIPS32 74K Core Family Revision 02 14...
Страница 134: ...8 4 Performance counters Programming the MIPS32 74K Core Family Revision 02 14 134...
Страница 154: ...C 3 FPU changes in Release 2 of the MIPS32 Architecture Programming the MIPS32 74K Core Family Revision 02 14 154...