background image

1.4 A brief guide to the 74K

 core implementation

Programming the MIPS32® 74K™ Core Family, Revision 02.14

18

The IFU’s branch predictor guesses whether conditional branches will be taken or not - it’s not magic, it uses a
BHT (a “Branch History Table”) of what happened to branches in the past, indexed by the low bits of the loca-
tion of the branch instruction. This particular hardware is an example of Combined branch prediction (majority
voting between three different algorithms, one of which is gshare; if you want to know, there’s a good wikipedia
article whose topic name is “Branch Predictor”). The branch predictor is taking a good guess. It can seem sur-
prising that the predictor makes no attempt to discover whether the history stored in a BHT slot is really that of
the current branch, or another one which happened to share the same low address bits; we’re going to be wrong
sometimes. It guesses correctly most of the time.

In this way the IFU can predict the next-instruction address and continue to run ahead.

When the IFU guesses wrong, it doesn’t know (the dog just rushes ahead until its owner reaches the fork). The
branch mispredict will be noticed once the branch instruction has been issued and carried through to the AGEN
“EC” stage, and is executed in its full context (“resolved”). On detecting a mispredict, the CPU must discard the
instructions based on the bad guess (which will not have graduated yet, so will not have changed any vital

machine state) and start fetching instructions from the correct target

5

. The exact penalty paid by a program which

suffers a mispredict depends on how busy the execution unit is, and how early it resolves the branch; the mini-
mum penalty is 12 cycles.

Even when we guess right, the branch target calculation in the IFU takes a little while to operate. A rapid
sequence of correctly-predicted branches can empty the queues, causing a program to run slower.

Jump-register instruction targets are unpredictable: the IFU has no knowledge of register data and can’t in gen-
eral anticipate it. But jump-register instructions are relatively rare, except for subroutine returns. In the MIPS
ISA you return from subroutines using a jump-register instruction,

jr $31

 (register 31 is, by a strong conven-

tion, used to hold the return address). So on every call instruction, the IFU pushes the return address onto a small

stack; and on every

jr $31

 it pops the value of the stack and uses that as its guess for the branch target

6

.

We have no way of knowing the target of a

jr

 instruction which uses a register other than

$31

. When we find

one of those, instruction fetch stops until the correct address is computed up in the AGEN pipeline, 12 or more
clocks later.

1.4.3 Loads and load-to-use delays

Even short-pipeline MIPS CPUs can’t deliver load data to the immediately following instruction without a delay,
even on a cache hit. Simple MIPS pipelines typically deliver the data one clock later: a one clock “load-to-use delay”.
Compilers and programmers try to put some useful and non-dependent operation between the load and its first use.

The 74K core’s long pipeline means that a full D-cache hit takes four clocks to return the data, not two: that would be
a three-clock “load-to-use delay”. A pair of loads dependent on each other (one fetches the other’s base address) must
be issued at least four cycles apart (that’s optimistic, hoping-for-a-hit timing).

But the AGEN and ALU pipelines are “skewed”, with ALU results delivered a cycle later than AGEN results. That
means that when an ALU operation is dependent on a load, it can be issued only three cycles after the load. There’s a
price to pay: a load/store whose base address is computed by a preceding ALU instruction must be issued a clock

5.

In “branch-likely” variants of conditional branch instructions a mispredict means we also did the wrong thing with the
instruction in the branch delay slot. To fix that up, we need to refetch the branch itself, so the penalty is at least one cycle
higher.

6.

The return-stack guess will be wrong for subroutines containing nested calls deeper than the size of the return stack; but sub-
routines high up the call tree are much more rarely executed, so this isn’t so bad.

Summary of Contents for MIPS32 74K Series

Page 1: ...Document Number MD00541 Revision 02 14 March 30 2011 Programming the MIPS32 74K Core Family...

Page 2: ...endments or supplements thereto Should a conflict arise regarding the export reexport transfer or release of the information contained in this document the laws of the United States of America shall b...

Page 3: ...ads writes and synchronization 30 3 3 1 Read write ordering and cache memory data queues in the 74K core 30 3 3 2 The sync instruction in 74K family cores 31 3 3 3 Write gathering and write buffer flu...

Page 4: ...mings 65 Chapter 5 Kernel mode OS programming and Release 2 of the MIPS32 Architecture 67 5 1 Hazard barrier instructions 67 5 2 MIPS32 Architecture Release 2 enhanced interrupt system s 68 5 2 1 Trad...

Page 5: ...5 7 6 Almost Alphabetically ordered table of DSP ASE instructions 96 7 7 DSP ASE instruction timing 100 Chapter 8 74K core features for debug and profiling 102 8 1 EJTAG on chip debug unit 102 8 1 1 D...

Page 6: ...tics 148 B 3 1 Different views of ITagLo DTagLo 148 B 3 2 Dual virtual and physical tags in the 74K core D cache DTagHi register 149 B 3 3 Pre decode information in the I cache the ITagHi Register 149...

Page 7: ...e 3 16 Fields in the ContextConfig register 53 Figure 5 1 Fields in the IntCtl Register 69 Figure 5 2 Fields in the EBase Register 72 Figure 5 3 Fields in the SRSCtl Register 73 Figure 5 4 Fields in t...

Page 8: ...Figure 8 20 Fields in the TraceControl3 register 123 Figure 8 21 Fields in the TraceIBPC TraceDBPC registers 125 Figure 8 22 Fields in the WatchLo0 3 Register 128 Figure 8 23 Fields in the WatchHi0 3...

Page 9: ...2 Long latency FP instructions 84 Table 7 1 Mask bits for instructions accessing the DSPControl register 93 Table 7 2 DSP instructions in alphabetical order 96 Table 8 1 JTAG instructions for the EJT...

Page 10: ...Programming the MIPS32 74K Core Family Revision 02 14 10...

Page 11: ...rchitecture these either concern priv ileged operation or are timing related Behavior which was standardized only in the recent Release 2 of the MIPS32 specification and not in previous versions All R...

Page 12: ...gisters prefetching Chapter 5 Kernel mode OS programming and Release 2 of the MIPS32 Architecture on page 67 74K core specific information about privileged mode programming Chapter 6 Floating point un...

Page 13: ...onfigure your 74K core with MIPS Technologies L2 cache between 128Kbyte and 1Mbyte in size Full details are in MIPS PDtrace Interface and Trace Control Block Specification MIPS Technologies document M...

Page 14: ...data cannot be available for some number of instructions Earlier MIPS Technologies cores had no real trouble with dependencies dependent instructions in almost all cases can run in consecutive cycles...

Page 15: ...f there was a mispredicted branch or an earlier in program order instruction took an exception Instead each instruction is assigned a completion buffer CB entry to receive its result The CB entry also...

Page 16: ...th luck the sec ond oldest too and if it s results are ready we graduate3 one or two instructions GRU stands for graduation unit Before we do that we make a last minute check for exceptions if one of...

Page 17: ...ction reaches graduation and finds the load missed we must do a redirect re fetch ing the consuming instruction and everything later in program order Next time the consuming instruction is an issue ca...

Page 18: ...ata and can t in gen eral anticipate it But jump register instructions are relatively rare except for subroutine returns In the MIPS ISA you return from subroutines using a jump register instruction j...

Page 19: ...out standing load and which register the data is destined to return to Compiled code is unlikely to reach this limit If you write carefully optimized code where you try to fill load use delays perhaps...

Page 20: ...1 4 A brief guide to the 74K core implementation Programming the MIPS32 74K Core Family Revision 02 14 20...

Page 21: ...ocate your exception entry points see Figure 5 2 and the text round it Table 2 1 Roles of Config registers Config A mix of historical and CPU dependent information described in Figure 2 1 below Some f...

Page 22: ...see Section 3 6 Scratchpad memory SPRAM Don t confuse this with the MIPS DSP ASE whose presence is indicated by Config3 DDSP UDI reads 1 if your core implements user defined CorExtend instructions Co...

Page 23: ...nfig1 2 registers These two read only registers tell you the size of the TLB and the size and organization of L1 L2 and L3 caches a zero line size is used to indicate a cache which isn t there They re...

Page 24: ...ove Config2 SU implementation specific bits for secondary cache if fitted Can be writable Config2 L2B Set to disable L2 cache bypass mode Setting this bit also forces Config2 SL to 0 most OS code will...

Page 25: ...ture Release 2 enhanced interrupt system s VInt reads 1 when the 74K core can handle vectored interrupts SP reads 0 when the 74K core does not support sub 4Kbyte page sizes CDMM reads 0 when the 74K c...

Page 26: ...the SoC builder who synthesizes the core refer to your SoC manual It should be a number between 0 and 127 higher values are reserved by MIPS Technologies PRId CoID Company ID which in this case is 1...

Page 27: ...support 64KB alias free D cache option option to have up to 8 outstanding cache misses previous maximum 4 July 12 2006 3_7_ 3 7 0 0x7c Less interlocks round cache instructions relocatable reset excep...

Page 28: ...2 2 PRId register identifying your CPU type Programming the MIPS32 74K Core Family Revision 02 14 28...

Page 29: ...o a full 32 bit physical address on the system interface More information about the TLB in Section 3 8 The TLB and translation Table 3 1 Basic MIPS32 architecture memory map Segment Virtual range What...

Page 30: ...get access to the system interface and send it off Even writes which hit in the cache are posted occurring after the instruction graduates Cache refills are handled after the missing load has graduat...

Page 31: ...at far Core interface ordering at the core interface read operations may be split into an address phase and a later data phase with other bus operations in between The 74K core as is permitted by MIPS...

Page 32: ...ory may be gathered stored together in the WBB and then dealt with by a single wider OCP write than the one you originally coded Sometimes this is what you want When it isn t put a sync between your s...

Page 33: ...ee Section 3 4 2 Cacheability options for details The L2 cache can run synchronously to the CPU core but particularly for memory arrays larger than 256Kbytes would typically then be the critical path...

Page 34: ...software cache management The 74K core s caches are not fully coherent and require OS intervention at times The cache instruction is the building block of such OS interventions and is required for co...

Page 35: ...You have to know the size of your cache discoverable from the Config1 2 registers see to know exactly where the field boundaries are but your address is used something like this Beware the MIPS32 spec...

Page 36: ...not dirty Certain CPUs implement a special form of the I side hit invalidate where multiple searches are done to ensure that any line matching the effective physical address is invalidated even if it...

Page 37: ...cache instructions except for index load run through graduation without delay and in particular any stream of hit type operations which miss in the cache can run 1 per clock A younger instruction whic...

Page 38: ...d ing bits of the physical address and aliases are possible The value of the one or two critical virtual address bits is sometimes called the page color It s possible for software to avoid aliases if...

Page 39: ...gisters for the D cache11 Some other MIPS CPUs use the same staging register s for all caches and even simple initialization software written for such CPUs is not portable to the 74K core Before getti...

Page 40: ...ser This register in the 74K core is implemented to support access to external L2 cache tags via cache instructions The definition of the fields of this 32 bit register are defined by the SoC designer...

Page 41: ...data or control fields from the external interface so this section really is just about parity protection in the cache It s a build time option selected by your system integrator whether to include ch...

Page 42: ...time is recoverable Way the way number of the cache entry where the error occurred Caution for the L1 caches which are no more than 4 way set associative this is a two bit field But an L2 cache might...

Page 43: ...Scratchpad memory SPRAM PI PD parity bits being read written to caches I and D cache respectively LBE WABE field indicating whether a bus error the last one if there s been more than one was triggered...

Page 44: ...ovide a reference design for both ISPRAM andDSPRAM which is what is described here If you keep the programming interface the same as the reference design you re more likely to be able to find software...

Page 45: ...base address of this chunk of SPRAM En enable the SPRAM From power up this bit is zero and until you set it to 1 the SPRAM is invisible The En bit is also visible in the second size configuration wor...

Page 46: ...evice ID version and size and also contains control bits that can enable user and supervisor read and or write access to the device This register is shown in Figure 3 10 CDMM devices are packed into t...

Page 47: ...er with the address space identifier from EntryHi ASID The table also stores a physical address plus cacheability attributes which becomes the output of the translation lookup The hardware TLB is rela...

Page 48: ...re whether it s on the input or output side there s only one but it can be read and written through either of EntryLo0 1 When set it causes addresses to match regardless of their ASID value thus defin...

Page 49: ...number of TLB misses in most cases Certain workloads particularly those accessing data sequentially where the working set just exceeds the mappable capacity of the non wired TLB entries may benefit f...

Page 50: ...you can t do a store using addresses translated here you ll get an exception instead However software can use it to track pages which have been written to when you first map a page you leave this bit...

Page 51: ...only Address error AdEL or AdES TLB XTLB Refill TLB Invalid TLBL TLBS and TLB Modified for more on exception codes in Cause ExcCode see the notes to Table B 5 Context contains the useful mix of pre pr...

Page 52: ...are and are unaffected by the exception Bits Y 1 0 will always read as 0 If X 23 and Y 4 i e bits 22 4 are set in ContextConfig the behavior is identical to the standard MIPS32 Context register bits 2...

Page 53: ...guous 1 bits are written into the register field It is permissible to implement a subset of the ContextConfig register in which some number of bits are read only and set to one or zero as appropriate...

Page 54: ...3 8 The TLB and translation Programming the MIPS32 74K Core Family Revision 02 14 54...

Page 55: ...rs which are readable by unprivileged user space programs usually to share information which is worth making accessible to programs without the overhead of a system call The hardware registers provide...

Page 56: ...logically a no op15 The pref instruction comes with various possible hints which allow the program to express its best guess about the likely fate of the cache line In 74K family cores the load and st...

Page 57: ...m the cache For data you expect to use more than once and which may be subject to com petition from streamed data 7 store_retained 25 writeback_invalidate nudge If the line is in the cache invalidate...

Page 58: ...l always get more insight from running code on a real CPU or a cycle accurate simulator 4 5 1 Cache delays and mitigating their effect In a typical 74K CPU implementation a cache miss which has to be...

Page 59: ...ationale for this is that it s extremely difficult to fetch the branch target quickly enough to avoid a delay so the extra instruction runs for free Most of the time the compiler deals well with this...

Page 60: ...ndard timing just so long as they hit in the cache When a load misses or handled the same way turns out to be uncached then a dependent oper ation which has already been issued will have to be replaye...

Page 61: ...n run just two clocks apart Each register has a standard place in the pipeline where the producer should deliver its value and another place in the pipeline where the consumer picks it up where those...

Page 62: ...store 1 the GPR value is an address operand Store data is not needed early ACC multiply instructions 3 the ACC value came from any multiply instruction which saturates the accumulator value ACC DSP in...

Page 63: ...e because they implicitly have three register operands the no move case is handled by reading the orig inal value of the destination register and writing it back but in 74K cores an instruction may on...

Page 64: ...cause of the late delivery of load data in t1 load box of Table 4 3 plus another because that data is required to form the address load store address box of Table 4 2 Delays caused by dependencies on...

Page 65: ...s at once is dependent on multiple fields and that can t be tracked through the CB system Such a rddsp is not issued until all predecessors have graduated and such a wrdsp must graduate before its suc...

Page 66: ...or But because that requires a relatively long pipeline multiply divide unit instructions which produce a result in a GP register are relatively slow for example an instruction consuming the register...

Page 67: ...e you can get unexpected behavior if an effect is deferred out of its normal instruction sequence But that can happen because the relevant control register only gets written some way down the pipeline...

Page 68: ...now required between an MTC0 and a MFC0 instruction type only when there is a CP0 register dependency This optimization reduces the stall cycles incurred by software TLB refill exception handlers when...

Page 69: ...ected to any input legal values for IntCtl IPTI IntCtl IPPCI and IntCtl IPFDCI are between 2 and 7 The timer performance counter and fast debug channel interrupt signals are taken out to the core inte...

Page 70: ...l interrupt entry point already an offset of 0x200 from the value defined in EBase to produce the entry point to be used If multiple interrupts are active and enabled the entry point will be the one a...

Page 71: ...nusable until initialized so MIPS CPUs start up in uncached ROM memory space and the exception entry points are all there for a while in fact for so long as Status BEV is set these ROM entry points ar...

Page 72: ...and the results of that are undefined EBase CPUNum On single threaded CPUs this is just a single CPU number field set by the core interface bus SI_CPUNum which the SoC designer will tie to some suitab...

Page 73: ...et number determines the next set and is made visible here in SRSCtl EICSS until the next interrupt The CPU is in EIC mode if Config3 VEIC indicating the hardware is EIC compliant and software has set...

Page 74: ...the result is unpredictable You can get at the values of registers in the previous set using rdpgpr and wrpgpr Just a note SRSCtl PSS and SRSCtl CSS are not updated by all exceptions but only those wh...

Page 75: ...ng Config7 WII set to 1 a wait condition will be terminated by an active interrupt signal even if that signal is prevented from causing an interrupt by Status IE being clear It s not immediately obvio...

Page 76: ...lue of the Count register HWREna SYNCI_Step Set this bit 1 so a user mode rdhwr 1 can read out the cache line size actually the smaller of the L1 I cache line size and D cache line size That line size...

Page 77: ...on your CPU Can run without an exception handler the FPU offers a range of options to handle very large and very small numbers in hardware With the 74K core full IEEE754 compliance does require that s...

Page 78: ...integer data is the higher bit num bered bytes shown in Figure 6 1 will be at the lowest memory location when the core is configured big endian and the highest memory location when the core is little...

Page 79: ...way the FPU works this is controlled by fields in the FPU control registers described here 6 4 1 IEEE options IEEE754 defines five classes of exceptional result For each class the programmer can sele...

Page 80: ...plement the MIPS 3D ASE PS does not implement the paired single instructions described in MIPS64V2 Processor ID Revision major and minor revisions of the FPU as is usual with revisions it s very usefu...

Page 81: ...ly and add The FN bit flush to nearest bit causes all result values to be replaced with somewhat better accuracy than you usually get with FS the result is either zero or a smallest normalized number...

Page 82: ...it was last written to zero by software RM is the rounding mode as required by IEEE 6 5 FPU pipeline and instruction timing This is not so simple The floating point unit FPU has its own pipeline More...

Page 83: ......

Page 84: ...r instruction reads the target cache line the program will probably not see much delay FP load instructions in the main pipeline are treated like integer loads an FP load which hits in the cache can b...

Page 85: ...ger AGEN pipeline s version of the same mfc1 instruction The timing is awkward because you have to find a free completion buffer write port Once the data is in the CB the mfc1 is a candidate for gradu...

Page 86: ...6 5 FPU pipeline and instruction timing Programming the MIPS32 74K Core Family Revision 02 14 86...

Page 87: ...ces use 16 bits for audio 8 bit data processing of printer images JPEG still images and video data 7 1 Features provided by the MIPS DSP ASE Those target applications can benefit from unconventional a...

Page 88: ...quences are made more usable by having four 64 bit result accumulator registers the old MIPS multiply divide unit has just one accessible as the hi lo registers The new ac0 is the old hi lo for backwa...

Page 89: ...es the size of the bit field to be inserted while pos specifies the insert position Caution in all inserts following the lead of the standard MIPS32 insert extract instructions pos is set to the lowes...

Page 90: ...with 32 bit paired half or quad byte values respectively Where there are two of these as in macq_s w phl the first one suggests the type of the result and the second the type of the operand s v in a s...

Page 91: ...pre adding a half to the least significant surviving bit Paired half and quad byte SIMD shifts shll ph shllv ph shll_s ph shllv_s are as above For PH only there s a shift right arithmetic instruction...

Page 92: ...ults get their low bits set 2 Q31 to a paired half both operands and result are assumed to be signed fractions so precrq ph w just takes the high halves of the two source operands and packs them into...

Page 93: ...accumulate maq_s w phl maq_s w phr picks either the left high or right low Q15 value from each operand multiplies them to Q31 and accumulates to a Q32 31 result The multiply is saturated only when it...

Page 94: ...eft The v version as usual takes the shift value from a register The right shift is a logical type so the result is zero extended Fill accumulator pushing low half to high mthlip moves the low half of...

Page 95: ...produce a Q63 result which is added to the accumu lator and saturated again dpsq_sa l w does the same except that the multiply result is subtracted from the accumulator again useful for the real comp...

Page 96: ...types are specified by relative bit position but C definitions are in memory order so these definitions need to be endianness dependent ifdef BIG_ENDIAN typedef struct q15 h1 h0 ph typedef struct u8 b...

Page 97: ...ively used as a Q32 31 fraction dpaq_sa l w ac rs rt Q31 saturated multiply accumulate dpau h qbl qb rs rt ac rs b3 rt b3 rs b2 rt b2 Dot product and accumulate of quad byte values l for left because...

Page 98: ...ach of the operand registers In all versions the Q15 multiplication is saturated to a Q31 results The _sa variants saturates the add result in the accumulator to a Q31 too maq_s w phr ac rs rt maq_sa...

Page 99: ...t precrq ph w makes a paired Q15 value by taking the MS bits of the Q31 values in rs and rt like this rd rs 0xFFFF0000 rt 16 0xFFFF precrq_rs ph w is the same but rounds and Q15 saturates both half re...

Page 100: ...tic because the vacated high bits of the value are replaced by copies of the input bit 16 the sign bit thus performing a cor rect division by a power of two of a signed number As usual the shra_v vari...

Page 101: ...The MIPS32 DSP ASE 101 Programming the MIPS32 74K Core Family Revision 02 14...

Page 102: ...JTAG pins already included in every SoC for chip test24 So the debug unit requires Physical communications with some kind of probe device which is itself controlled by the debug host achieved through...

Page 103: ...normal interrupts The address map changes in debug mode to give you access to the dseg region described below Quite a lot of exceptions just won t happen in debug mode those which do run peculiarly s...

Page 104: ...fter entering debug mode but it probably did that To return from a nested debug exception like this you don t use deret which would inappropriately take you out of debug mode you grab the address out...

Page 105: ...1100 0xFF30 1108 IBM10 0xFF30 1108 0xFF30 1110 IBASID0 0xFF30 1110 0xFF30 1118 IBC0 0xFF30 1118 I breakpoint 1 regs 0xFF30 1200 IBA1 0xFF30 1200 0xFF30 1208 IBM1 0xFF30 1208 0xFF30 1210 IBASID21 0xFF...

Page 106: ...to debug a system which has no physical memory reserved for debug TCB Registers These are the PDtrace EJTag Registers They are physically located in the PDtrace unit and managed by the PDtrace unit F...

Page 107: ...t to choose On some other implementations it s read only and just tells you what the CPU does IEXI set to 1 to defer imprecise exceptions Set by default on entry to debug mode cleared on exit but writ...

Page 108: ...e but which have not happened yet because they are imprecise and Debug IEXI is set They remain set until Debug IEXI is cleared explicitly or implicitly by a deret when the exception is delivered and t...

Page 109: ...he PC of instructions that missed in the instruction cache See Section 8 1 14 PC Sampling with EJTAG for details DAS DASQ DASE DAS reads 1 if the Data Address Sampling feature is available If supporte...

Page 110: ...are costs for no real loss in functionality ISA In cores with the microMIPS ISA this bit can specify which ISA the exception handler is built in This is tied to 0 on this core as the MIPS16 ASE does n...

Page 111: ...indicates what type of entity is associated with this TAP and if the TypeInfo field is used TypeInfo identifier information specific to the entity associated with this TAP Rocc reset occurred reads 1...

Page 112: ...eset signal which is more reliable ProbEn ProbTrap EjtagBrk ProbEn must be set before CPU accesses to dmseg will be sent to the probe It can be written by the probe directly ProbTrap relocates the deb...

Page 113: ...he FDC registers within the device block Each device within the CDMM begins with an Access Control and Status Register which gives information about the device and also provides a means for giving use...

Page 114: ...earlier to avoid wasting transfers of null transmit data or non accepted receive data or minimum latency to be interrupted as soon as data is available This register is shown in Figure 8 10 Figure 8 1...

Page 115: ...ID and written into the FIFO with the data Results are undefined if FDSTAT TxF 1 so that register should be checked prior to writing data Figure 8 13 Fields in the FDC Transmit FDTXn Registers 8 1 11...

Page 116: ...ns and allows you to determine whether an EJTAG I breakpoint may apply only in MIPS16 or non MIPS16 mode IBASIDn DBASIDn specifies an 8 bit ASID which may be compared against the current EntryHi ASID...

Page 117: ...nore its value Set this field all ones to disable the data match TE set 1 to use as trigger for PDtrace instruction tracing BE set 1 to activate breakpoint This fields resets to zero to avoid spurious...

Page 118: ...and data breakpoints filtering only on address conditions are precise that means that 1 DEPC will point at the fetched or load store instruction itself except if it s in a branch delay slot will poin...

Page 119: ...sure to read it back and see if the write stuck so that you know how many bits to scan and how to interpret them EJTAG revision 5 0 adds a new optional mechanism for triggering PC sampling when an in...

Page 120: ...are that comes up after a hard or soft reset to know the last known good value of TCBRDP before system crash and potentially read the trace mem ory from or to the appropriate trace memory location 0x3...

Page 121: ...od probes have generous amounts of high speed memory to store long traces TraceControl2 ValidModes TBI TBU described below at Figure 7 10 and following tell you whether you have such a connection avai...

Page 122: ...trace format is five bits to sup port 32 outstanding load and stores The outstanding loads and stores is with respect to the PDtrace unit not the Load Store unit Figure 8 16 Fields in the TCBCONTROLE...

Page 123: ...dual EJTAG breakpoint trace triggers take effect Figure 8 18 Fields in the TraceControl Register Figure 8 19 Fields in the TraceControl2 Register Figure 8 20 Fields in the TraceControl3 register TS se...

Page 124: ...lly including the miss address TIM switch on to trace all I cache misses On master trace on off switch set 0 to do no tracing at all The read only fields in TraceControl2 provide information about the...

Page 125: ...ied and if the trace unit is idle then it is safe to change the trace control settings After changing the settings trace can be turned back on and tracing resumes cleanly with the new control The rest...

Page 126: ...s this register CP0 access rules apply when writing to this user register 8 2 5 Summary of when trace happens The many different enable bits which control trace add up to or strictly and up to a whole...

Page 127: ...d of on trigger and if this trigger is conditional on arm there must have been an arm event since system reset or any disarm event or the trigger unconditionally turns trace on And since the on trigge...

Page 128: ...control fields 8 3 1 The WatchLo0 3 registers Used in conjunction with WatchHi0 3 respectively each of these registers carries the virtual address and what to match fields for a CP0 watchpoint Figure...

Page 129: ...out is shown in Figure 8 24 Figure 8 24 Fields in the PerfCtl0 3 Register There are usually four counters but software should check using the PerfCtl M bit which indicates at least one more Then the f...

Page 130: ...he I cache and fetch four instructions at once so you only get one cache fetch for that group of four instructions But even then an unconditional branch which is not at the end of a group of four inst...

Page 131: ...uction buffer is full Number of valid fetch slots killed in the IFU due to branches jumps or other stalling instructions 10 Reserved Reserved 11 Reserved Reserved 12 Reserved 13 Cycles when no instruc...

Page 132: ...74K core s D cache has an auxiliary virtual tag used to help pick the right line early When occa sionally the physical tag check shows some mis match it is treated as a cache miss in processing the m...

Page 133: ...ed Includes Floating Point Loads 54 Cycles where one instruction graduated Cycles where two instructions graduated 55 GFifo blocked cycles Floating point stores graduated 56 Number of cycles 0 instruc...

Page 134: ...8 4 Performance counters Programming the MIPS32 74K Core Family Revision 02 14 134...

Page 135: ...ic architectural documents MIPS32 The MIPS32 architecture definition series in three volumes MIPS32V1 Introduction to the MIPS32 Architecture MIPS Technologies document MD00080 MIPS32V2 The MIPS32 Ins...

Page 136: ...tion to the MIPS architecture updated in 2006 to reflect the current version of MIPS32 MIPSPROG MIPS Programmers Handbook Erin Farquar Philip Bunce Morgan Kaufmann ISBN 1 55860 297 6 Restricted to the...

Page 137: ...ers Unused fields in registers are marked either with a digit 0 or an X A field marked zero should always be written with zero and subject to that is guaranteed to read zero on cores in the 74K family...

Page 138: ...Free running counter at pipeline or sub multiple speed B 1 5 p 145 10 0 EntryHi High order portion of the TLB entry 3 12 p 49 11 0 Compare Timer interrupt control B 1 5 p 145 12 0 Status Processor sta...

Page 139: ...s 3 4 17 p 42 27 0 CacheErr Cache parity exception status 3 4 16 p 41 28 0 ITagLo Read write interface for load store tag cacheops but when used for scratchpad RAM configuration see Section 3 8 p 45 3...

Page 140: ...1 Debug 23 0 EPC 14 0 EntryHi 10 0 PDtrace TraceControl 23 1 Timer Compare 11 0 EntryLo0 1 2 0 3 0 TraceControl2 23 2 Count 9 0 Index 0 0 TraceControl3 24 2 CPU Configuration Config 16 0 PageMask 5 0...

Page 141: ...y to allow often MX is set to 1 to enable instructions in either the MIPS DSP extension to the MIPS architecture or the MDMX extension The two may not be used together and MDMX is unlikely to ever be...

Page 142: ...rupt bits programmable at Cause IP1 0 Status UM SM execution privilege level basically user or kernel The intermediate supervisor privilege level is rarely used but that s why this is a 2 bit field Re...

Page 143: ...aused the exception perhaps to emulate it Cause TI last interrupt was from the on core timer see section below for Count Compare Cause CE if that was a co processor unusable exception this is the co p...

Page 144: ...ligned or a privilege viola tion 5 AdES 6 IBE Bus error signaled on instruction fetch 7 DBE Bus error signaled on load store imprecise 8 Sys System call ie syscall instruction executed 9 Bp Breakpoint...

Page 145: ...s handy For a periodic interrupt simply advance Compare by a fixed amount each time and check for the possibility that Count has overrun it To set a timer for some point in the future just set Compare...

Page 146: ...ics of ehb found on older CPUs By default ehb will check whether any instructions in flight are directly writing CP0 registers if such instructions exist it will block issue of instructions from the i...

Page 147: ...Section 3 4 9 Cache aliases All the remaining fields are read write and control various functions Only one of them is likely to find real system use Config7 PREF defaults to 2 b01 These two bits contr...

Page 148: ...s non blocking loads Normally the 74K core will keep running after a load instruction even if it misses in the D cache until the data is used With this disable bit set the CPU will stall on any load D...

Page 149: ...provides a very fast way of predicting whether there s a cache hit and if so which way of the cache will contain the right data But the virtual tag check is heuristic in some cases it will turn out o...

Page 150: ...data instruction Which word of the cache line is transferred depends on the low address fed to the cache instruction D cache load stores transfer one word in DDataLo but I cache load stores transfer t...

Page 151: ...set gains some useful extra features shown below User level pro grams also get limited access to hardware registers useful for user privilege software but which wants to adapt portably to get the best...

Page 152: ...b execution hazards side effects of old instructions which affect how an instruction executes but excluding those which affect the instruction fetch process jalr hb jr hb hazards of all kinds Note tha...

Page 153: ...ue such as a thread ID or a pointer to thread specific storage to the underlying Cop0 register and user mode programs can read it via rdhwr C 3 FPU changes in Release 2 of the MIPS32 Architecture The...

Page 154: ...C 3 FPU changes in Release 2 of the MIPS32 Architecture Programming the MIPS32 74K Core Family Revision 02 14 154...

Page 155: ...etc Miscellaneous fixes Change bars are vs 2 00 2 11 15th December 2007 For 2 11 release of the 74K core Changes include Update the number of pipeline stages Include Instruction Cache prefetch option...

Page 156: ...re Family Revision 02 14 156 2 14 March 30 2011 Add Type and TypeInfo fields in implementation register Add Cache miss PC Sampling feature Revision Date Description Copyright Wave Computing Inc All ri...

Reviews: