4.2 Prefetching data
Programming the MIPS32® 74K™ Core Family, Revision 02.14
56
If
SYNCI_Step
returns zero, that means that your hardware ensures that your caches are instruction/data coher-
ent, and you don’t need to use
synci
at all.
•
CC (2): user-mode read-only access to the CP0
Count
register, for high-resolution counting. Which wouldn’t be
much good without.
•
CCRes (3): which tells you how fast
Count
counts. It’s a divider from the pipeline clock — if the rdhwr instruc-
tion reads a value of “2”, then
Count
increments every 2 cycles, at half the pipeline clock rate.For 74K family
cores that is precisely what you will read.
•
UL (30): user-mode read-only access to the CP0
UserLocal
register. This register can be used to provide a thread
identifier to user-mode programs. See
Section C.4.2 “The UserLocal register”
for more details
4.2 Prefetching data
MIPS32 CPUs are being increasingly used for computations which feature loops accessing large arrays, and the run-
time is often dominated by cache misses.
These are excellent candidates for using the
pref
instruction, which gets data into the cache without affecting the
CPUs other state. In a well-optimized loop with prefetch, data for the next iteration can be fetched into the cache in
parallel with computation for the last iteration.
It’s a pretty major principle that
pref
should have no software-visible effect other than to make things go faster.
pref
is logically a no-op
15
.
The
pref
instruction comes with various possible “hints” which allow the program to express its best guess about the
likely fate of the cache line. In 74K family cores the “load” and “store” variants of the hints do the same thing; but it
makes good sense to use the hint which matches your program’s intention - you might one day port it to a CPU where
it makes a difference, and it can’t do any harm.
The 74K core acts on hints as summarized in
.
4.3 Using “synci” when writing instructions
The
synci
instruction (introduced with Revision 2 of the MIPS32 architecture specification,
) ensures that
instructions written by a program (necessarily through the D-cache, if you’re running cached) get written back from
the D-cache and corresponding I-cache locations invalidated, so that any future execution at the address will reliably
execute the new instructions.
synci
takes an address argument, and it takes effect on a whole enclosing cache-line
sized piece of memory. User-level programs can discover the cache line size because it’s available in a “hardware reg-
isters” accessed by
rdhwr
, as described in
Section 4.1, "User-mode accessible “Hardware registers”"
above.
Since
synci
is modifying the program’s own instruction stream, it’s inherently an “instruction hazard”: so when
you’ve finished writing your instructions and issued the last
synci
, you should then use a
jr.hb
or equivalent to call
the new instructions — see
Section 5.1 “Hazard barrier instructions”
15. This isn’t quite true any more;
pref
with the “PrepareForStore” hint can zero out some data which wasn’t previously zero.
Содержание MIPS32 74K Series
Страница 1: ...Document Number MD00541 Revision 02 14 March 30 2011 Programming the MIPS32 74K Core Family...
Страница 10: ...Programming the MIPS32 74K Core Family Revision 02 14 10...
Страница 20: ...1 4 A brief guide to the 74K core implementation Programming the MIPS32 74K Core Family Revision 02 14 20...
Страница 28: ...2 2 PRId register identifying your CPU type Programming the MIPS32 74K Core Family Revision 02 14 28...
Страница 54: ...3 8 The TLB and translation Programming the MIPS32 74K Core Family Revision 02 14 54...
Страница 83: ......
Страница 86: ...6 5 FPU pipeline and instruction timing Programming the MIPS32 74K Core Family Revision 02 14 86...
Страница 101: ...The MIPS32 DSP ASE 101 Programming the MIPS32 74K Core Family Revision 02 14...
Страница 134: ...8 4 Performance counters Programming the MIPS32 74K Core Family Revision 02 14 134...
Страница 154: ...C 3 FPU changes in Release 2 of the MIPS32 Architecture Programming the MIPS32 74K Core Family Revision 02 14 154...