background image

 Memory map, caching, reads, writes and translation

37

Programming the MIPS32® 74K™ Core Family, Revision 02.14

and in C header files. I hope

Table 3.1

 helps. In the rest of this document we’ll either use the full software name or

(quite often) just talk of

TagLo

 without qualification.:

3.4.6 L1 Cache instruction timing

Most CP0 instructions are used rarely, in code which is not timing-critical. But an OS which has to manage caches
around I/O operations or otherwise may have to sit in a tight loop issuing hundreds of

cache

operations at a time, so

performance can be important. Firstly, any D-side

cache

 instruction will check the FSB queue (as described in

Section 3.3 “Reads, writes and synchronization”

for potentially matching entries. The “potential match” check uses

the cache index, and avoids taking any action for most irrelevant FSB activity. But on a potential match the cacheop
waits (stalling the memory pipeline) while any pending cache refills happen, and while any dirty lines evicted from
the cache are sent out at least to the CPU’s write buffer. Typically, this will not take more than a few clocks, and will
only need to be done once for a stream of cacheops.

In the 74K core, the whole cacheop is executed in the memory pipeline, after the

cache

 instruction graduates. All

cache

instructions except for “index load...” run through graduation without delay — and in particular, any stream of

hit-type operations which miss in the cache can run 1-per-clock.

A younger instruction which has run ahead of the cacheop is checked while it waits for graduation; if it might run
incorrectly because of an incomplete cacheop, the younger instruction is cancelled and the whole execution unit
backed off so it can be re-issued from scratch (an EU “replay” — expensive but infrequent).

3.4.7 L2 cache instruction timing

The L2 cache run synchronously with the CPU but at a configurable clock ratio. The L2 operations will be signifi-
cantly slower than L1 versions even at the same clock ratio. Exactly how slow is dependent on the performance of the
memory blocks used to build your L2 cache and the L2 clock ratio.

3.4.8 Cache management when writing instructions - the “synci” instruction

The

synci

 instruction (new to the MIPS32 Release 2 update) provides a clean mechanism - available to user-level

code, not just at kernel privilege level - for ensuring that instructions you’ve just written are correctly presented for

Table 3.1 Caches and their CP0 cache tag/data registers

Cache

CP0 Registers

CP0 number

L1 I-cache

ITagLo

28.0

ITagHi

29.0

IDataLo

28.1

IDtataHi

29.1

L1 D-cache

DTagLo

28.2

DTagHi

29.2

DDataLo

28.3

L2 cache

L23TagLo

1

1. In past versions of this manual

L23TagLo

was known as

“STagLo”, and so on. But this name is more mnemonic.

28.4

L23DataLo

28.5

L23DataHi

29.5

Содержание MIPS32 74K Series

Страница 1: ...Document Number MD00541 Revision 02 14 March 30 2011 Programming the MIPS32 74K Core Family...

Страница 2: ...endments or supplements thereto Should a conflict arise regarding the export reexport transfer or release of the information contained in this document the laws of the United States of America shall b...

Страница 3: ...ads writes and synchronization 30 3 3 1 Read write ordering and cache memory data queues in the 74K core 30 3 3 2 The sync instruction in 74K family cores 31 3 3 3 Write gathering and write buffer flu...

Страница 4: ...mings 65 Chapter 5 Kernel mode OS programming and Release 2 of the MIPS32 Architecture 67 5 1 Hazard barrier instructions 67 5 2 MIPS32 Architecture Release 2 enhanced interrupt system s 68 5 2 1 Trad...

Страница 5: ...5 7 6 Almost Alphabetically ordered table of DSP ASE instructions 96 7 7 DSP ASE instruction timing 100 Chapter 8 74K core features for debug and profiling 102 8 1 EJTAG on chip debug unit 102 8 1 1 D...

Страница 6: ...tics 148 B 3 1 Different views of ITagLo DTagLo 148 B 3 2 Dual virtual and physical tags in the 74K core D cache DTagHi register 149 B 3 3 Pre decode information in the I cache the ITagHi Register 149...

Страница 7: ...e 3 16 Fields in the ContextConfig register 53 Figure 5 1 Fields in the IntCtl Register 69 Figure 5 2 Fields in the EBase Register 72 Figure 5 3 Fields in the SRSCtl Register 73 Figure 5 4 Fields in t...

Страница 8: ...Figure 8 20 Fields in the TraceControl3 register 123 Figure 8 21 Fields in the TraceIBPC TraceDBPC registers 125 Figure 8 22 Fields in the WatchLo0 3 Register 128 Figure 8 23 Fields in the WatchHi0 3...

Страница 9: ...2 Long latency FP instructions 84 Table 7 1 Mask bits for instructions accessing the DSPControl register 93 Table 7 2 DSP instructions in alphabetical order 96 Table 8 1 JTAG instructions for the EJT...

Страница 10: ...Programming the MIPS32 74K Core Family Revision 02 14 10...

Страница 11: ...rchitecture these either concern priv ileged operation or are timing related Behavior which was standardized only in the recent Release 2 of the MIPS32 specification and not in previous versions All R...

Страница 12: ...gisters prefetching Chapter 5 Kernel mode OS programming and Release 2 of the MIPS32 Architecture on page 67 74K core specific information about privileged mode programming Chapter 6 Floating point un...

Страница 13: ...onfigure your 74K core with MIPS Technologies L2 cache between 128Kbyte and 1Mbyte in size Full details are in MIPS PDtrace Interface and Trace Control Block Specification MIPS Technologies document M...

Страница 14: ...data cannot be available for some number of instructions Earlier MIPS Technologies cores had no real trouble with dependencies dependent instructions in almost all cases can run in consecutive cycles...

Страница 15: ...f there was a mispredicted branch or an earlier in program order instruction took an exception Instead each instruction is assigned a completion buffer CB entry to receive its result The CB entry also...

Страница 16: ...th luck the sec ond oldest too and if it s results are ready we graduate3 one or two instructions GRU stands for graduation unit Before we do that we make a last minute check for exceptions if one of...

Страница 17: ...ction reaches graduation and finds the load missed we must do a redirect re fetch ing the consuming instruction and everything later in program order Next time the consuming instruction is an issue ca...

Страница 18: ...ata and can t in gen eral anticipate it But jump register instructions are relatively rare except for subroutine returns In the MIPS ISA you return from subroutines using a jump register instruction j...

Страница 19: ...out standing load and which register the data is destined to return to Compiled code is unlikely to reach this limit If you write carefully optimized code where you try to fill load use delays perhaps...

Страница 20: ...1 4 A brief guide to the 74K core implementation Programming the MIPS32 74K Core Family Revision 02 14 20...

Страница 21: ...ocate your exception entry points see Figure 5 2 and the text round it Table 2 1 Roles of Config registers Config A mix of historical and CPU dependent information described in Figure 2 1 below Some f...

Страница 22: ...see Section 3 6 Scratchpad memory SPRAM Don t confuse this with the MIPS DSP ASE whose presence is indicated by Config3 DDSP UDI reads 1 if your core implements user defined CorExtend instructions Co...

Страница 23: ...nfig1 2 registers These two read only registers tell you the size of the TLB and the size and organization of L1 L2 and L3 caches a zero line size is used to indicate a cache which isn t there They re...

Страница 24: ...ove Config2 SU implementation specific bits for secondary cache if fitted Can be writable Config2 L2B Set to disable L2 cache bypass mode Setting this bit also forces Config2 SL to 0 most OS code will...

Страница 25: ...ture Release 2 enhanced interrupt system s VInt reads 1 when the 74K core can handle vectored interrupts SP reads 0 when the 74K core does not support sub 4Kbyte page sizes CDMM reads 0 when the 74K c...

Страница 26: ...the SoC builder who synthesizes the core refer to your SoC manual It should be a number between 0 and 127 higher values are reserved by MIPS Technologies PRId CoID Company ID which in this case is 1...

Страница 27: ...support 64KB alias free D cache option option to have up to 8 outstanding cache misses previous maximum 4 July 12 2006 3_7_ 3 7 0 0x7c Less interlocks round cache instructions relocatable reset excep...

Страница 28: ...2 2 PRId register identifying your CPU type Programming the MIPS32 74K Core Family Revision 02 14 28...

Страница 29: ...o a full 32 bit physical address on the system interface More information about the TLB in Section 3 8 The TLB and translation Table 3 1 Basic MIPS32 architecture memory map Segment Virtual range What...

Страница 30: ...get access to the system interface and send it off Even writes which hit in the cache are posted occurring after the instruction graduates Cache refills are handled after the missing load has graduat...

Страница 31: ...at far Core interface ordering at the core interface read operations may be split into an address phase and a later data phase with other bus operations in between The 74K core as is permitted by MIPS...

Страница 32: ...ory may be gathered stored together in the WBB and then dealt with by a single wider OCP write than the one you originally coded Sometimes this is what you want When it isn t put a sync between your s...

Страница 33: ...ee Section 3 4 2 Cacheability options for details The L2 cache can run synchronously to the CPU core but particularly for memory arrays larger than 256Kbytes would typically then be the critical path...

Страница 34: ...software cache management The 74K core s caches are not fully coherent and require OS intervention at times The cache instruction is the building block of such OS interventions and is required for co...

Страница 35: ...You have to know the size of your cache discoverable from the Config1 2 registers see to know exactly where the field boundaries are but your address is used something like this Beware the MIPS32 spec...

Страница 36: ...not dirty Certain CPUs implement a special form of the I side hit invalidate where multiple searches are done to ensure that any line matching the effective physical address is invalidated even if it...

Страница 37: ...cache instructions except for index load run through graduation without delay and in particular any stream of hit type operations which miss in the cache can run 1 per clock A younger instruction whic...

Страница 38: ...d ing bits of the physical address and aliases are possible The value of the one or two critical virtual address bits is sometimes called the page color It s possible for software to avoid aliases if...

Страница 39: ...gisters for the D cache11 Some other MIPS CPUs use the same staging register s for all caches and even simple initialization software written for such CPUs is not portable to the 74K core Before getti...

Страница 40: ...ser This register in the 74K core is implemented to support access to external L2 cache tags via cache instructions The definition of the fields of this 32 bit register are defined by the SoC designer...

Страница 41: ...data or control fields from the external interface so this section really is just about parity protection in the cache It s a build time option selected by your system integrator whether to include ch...

Страница 42: ...time is recoverable Way the way number of the cache entry where the error occurred Caution for the L1 caches which are no more than 4 way set associative this is a two bit field But an L2 cache might...

Страница 43: ...Scratchpad memory SPRAM PI PD parity bits being read written to caches I and D cache respectively LBE WABE field indicating whether a bus error the last one if there s been more than one was triggered...

Страница 44: ...ovide a reference design for both ISPRAM andDSPRAM which is what is described here If you keep the programming interface the same as the reference design you re more likely to be able to find software...

Страница 45: ...base address of this chunk of SPRAM En enable the SPRAM From power up this bit is zero and until you set it to 1 the SPRAM is invisible The En bit is also visible in the second size configuration wor...

Страница 46: ...evice ID version and size and also contains control bits that can enable user and supervisor read and or write access to the device This register is shown in Figure 3 10 CDMM devices are packed into t...

Страница 47: ...er with the address space identifier from EntryHi ASID The table also stores a physical address plus cacheability attributes which becomes the output of the translation lookup The hardware TLB is rela...

Страница 48: ...re whether it s on the input or output side there s only one but it can be read and written through either of EntryLo0 1 When set it causes addresses to match regardless of their ASID value thus defin...

Страница 49: ...number of TLB misses in most cases Certain workloads particularly those accessing data sequentially where the working set just exceeds the mappable capacity of the non wired TLB entries may benefit f...

Страница 50: ...you can t do a store using addresses translated here you ll get an exception instead However software can use it to track pages which have been written to when you first map a page you leave this bit...

Страница 51: ...only Address error AdEL or AdES TLB XTLB Refill TLB Invalid TLBL TLBS and TLB Modified for more on exception codes in Cause ExcCode see the notes to Table B 5 Context contains the useful mix of pre pr...

Страница 52: ...are and are unaffected by the exception Bits Y 1 0 will always read as 0 If X 23 and Y 4 i e bits 22 4 are set in ContextConfig the behavior is identical to the standard MIPS32 Context register bits 2...

Страница 53: ...guous 1 bits are written into the register field It is permissible to implement a subset of the ContextConfig register in which some number of bits are read only and set to one or zero as appropriate...

Страница 54: ...3 8 The TLB and translation Programming the MIPS32 74K Core Family Revision 02 14 54...

Страница 55: ...rs which are readable by unprivileged user space programs usually to share information which is worth making accessible to programs without the overhead of a system call The hardware registers provide...

Страница 56: ...logically a no op15 The pref instruction comes with various possible hints which allow the program to express its best guess about the likely fate of the cache line In 74K family cores the load and st...

Страница 57: ...m the cache For data you expect to use more than once and which may be subject to com petition from streamed data 7 store_retained 25 writeback_invalidate nudge If the line is in the cache invalidate...

Страница 58: ...l always get more insight from running code on a real CPU or a cycle accurate simulator 4 5 1 Cache delays and mitigating their effect In a typical 74K CPU implementation a cache miss which has to be...

Страница 59: ...ationale for this is that it s extremely difficult to fetch the branch target quickly enough to avoid a delay so the extra instruction runs for free Most of the time the compiler deals well with this...

Страница 60: ...ndard timing just so long as they hit in the cache When a load misses or handled the same way turns out to be uncached then a dependent oper ation which has already been issued will have to be replaye...

Страница 61: ...n run just two clocks apart Each register has a standard place in the pipeline where the producer should deliver its value and another place in the pipeline where the consumer picks it up where those...

Страница 62: ...store 1 the GPR value is an address operand Store data is not needed early ACC multiply instructions 3 the ACC value came from any multiply instruction which saturates the accumulator value ACC DSP in...

Страница 63: ...e because they implicitly have three register operands the no move case is handled by reading the orig inal value of the destination register and writing it back but in 74K cores an instruction may on...

Страница 64: ...cause of the late delivery of load data in t1 load box of Table 4 3 plus another because that data is required to form the address load store address box of Table 4 2 Delays caused by dependencies on...

Страница 65: ...s at once is dependent on multiple fields and that can t be tracked through the CB system Such a rddsp is not issued until all predecessors have graduated and such a wrdsp must graduate before its suc...

Страница 66: ...or But because that requires a relatively long pipeline multiply divide unit instructions which produce a result in a GP register are relatively slow for example an instruction consuming the register...

Страница 67: ...e you can get unexpected behavior if an effect is deferred out of its normal instruction sequence But that can happen because the relevant control register only gets written some way down the pipeline...

Страница 68: ...now required between an MTC0 and a MFC0 instruction type only when there is a CP0 register dependency This optimization reduces the stall cycles incurred by software TLB refill exception handlers when...

Страница 69: ...ected to any input legal values for IntCtl IPTI IntCtl IPPCI and IntCtl IPFDCI are between 2 and 7 The timer performance counter and fast debug channel interrupt signals are taken out to the core inte...

Страница 70: ...l interrupt entry point already an offset of 0x200 from the value defined in EBase to produce the entry point to be used If multiple interrupts are active and enabled the entry point will be the one a...

Страница 71: ...nusable until initialized so MIPS CPUs start up in uncached ROM memory space and the exception entry points are all there for a while in fact for so long as Status BEV is set these ROM entry points ar...

Страница 72: ...and the results of that are undefined EBase CPUNum On single threaded CPUs this is just a single CPU number field set by the core interface bus SI_CPUNum which the SoC designer will tie to some suitab...

Страница 73: ...et number determines the next set and is made visible here in SRSCtl EICSS until the next interrupt The CPU is in EIC mode if Config3 VEIC indicating the hardware is EIC compliant and software has set...

Страница 74: ...the result is unpredictable You can get at the values of registers in the previous set using rdpgpr and wrpgpr Just a note SRSCtl PSS and SRSCtl CSS are not updated by all exceptions but only those wh...

Страница 75: ...ng Config7 WII set to 1 a wait condition will be terminated by an active interrupt signal even if that signal is prevented from causing an interrupt by Status IE being clear It s not immediately obvio...

Страница 76: ...lue of the Count register HWREna SYNCI_Step Set this bit 1 so a user mode rdhwr 1 can read out the cache line size actually the smaller of the L1 I cache line size and D cache line size That line size...

Страница 77: ...on your CPU Can run without an exception handler the FPU offers a range of options to handle very large and very small numbers in hardware With the 74K core full IEEE754 compliance does require that s...

Страница 78: ...integer data is the higher bit num bered bytes shown in Figure 6 1 will be at the lowest memory location when the core is configured big endian and the highest memory location when the core is little...

Страница 79: ...way the FPU works this is controlled by fields in the FPU control registers described here 6 4 1 IEEE options IEEE754 defines five classes of exceptional result For each class the programmer can sele...

Страница 80: ...plement the MIPS 3D ASE PS does not implement the paired single instructions described in MIPS64V2 Processor ID Revision major and minor revisions of the FPU as is usual with revisions it s very usefu...

Страница 81: ...ly and add The FN bit flush to nearest bit causes all result values to be replaced with somewhat better accuracy than you usually get with FS the result is either zero or a smallest normalized number...

Страница 82: ...it was last written to zero by software RM is the rounding mode as required by IEEE 6 5 FPU pipeline and instruction timing This is not so simple The floating point unit FPU has its own pipeline More...

Страница 83: ......

Страница 84: ...r instruction reads the target cache line the program will probably not see much delay FP load instructions in the main pipeline are treated like integer loads an FP load which hits in the cache can b...

Страница 85: ...ger AGEN pipeline s version of the same mfc1 instruction The timing is awkward because you have to find a free completion buffer write port Once the data is in the CB the mfc1 is a candidate for gradu...

Страница 86: ...6 5 FPU pipeline and instruction timing Programming the MIPS32 74K Core Family Revision 02 14 86...

Страница 87: ...ces use 16 bits for audio 8 bit data processing of printer images JPEG still images and video data 7 1 Features provided by the MIPS DSP ASE Those target applications can benefit from unconventional a...

Страница 88: ...quences are made more usable by having four 64 bit result accumulator registers the old MIPS multiply divide unit has just one accessible as the hi lo registers The new ac0 is the old hi lo for backwa...

Страница 89: ...es the size of the bit field to be inserted while pos specifies the insert position Caution in all inserts following the lead of the standard MIPS32 insert extract instructions pos is set to the lowes...

Страница 90: ...with 32 bit paired half or quad byte values respectively Where there are two of these as in macq_s w phl the first one suggests the type of the result and the second the type of the operand s v in a s...

Страница 91: ...pre adding a half to the least significant surviving bit Paired half and quad byte SIMD shifts shll ph shllv ph shll_s ph shllv_s are as above For PH only there s a shift right arithmetic instruction...

Страница 92: ...ults get their low bits set 2 Q31 to a paired half both operands and result are assumed to be signed fractions so precrq ph w just takes the high halves of the two source operands and packs them into...

Страница 93: ...accumulate maq_s w phl maq_s w phr picks either the left high or right low Q15 value from each operand multiplies them to Q31 and accumulates to a Q32 31 result The multiply is saturated only when it...

Страница 94: ...eft The v version as usual takes the shift value from a register The right shift is a logical type so the result is zero extended Fill accumulator pushing low half to high mthlip moves the low half of...

Страница 95: ...produce a Q63 result which is added to the accumu lator and saturated again dpsq_sa l w does the same except that the multiply result is subtracted from the accumulator again useful for the real comp...

Страница 96: ...types are specified by relative bit position but C definitions are in memory order so these definitions need to be endianness dependent ifdef BIG_ENDIAN typedef struct q15 h1 h0 ph typedef struct u8 b...

Страница 97: ...ively used as a Q32 31 fraction dpaq_sa l w ac rs rt Q31 saturated multiply accumulate dpau h qbl qb rs rt ac rs b3 rt b3 rs b2 rt b2 Dot product and accumulate of quad byte values l for left because...

Страница 98: ...ach of the operand registers In all versions the Q15 multiplication is saturated to a Q31 results The _sa variants saturates the add result in the accumulator to a Q31 too maq_s w phr ac rs rt maq_sa...

Страница 99: ...t precrq ph w makes a paired Q15 value by taking the MS bits of the Q31 values in rs and rt like this rd rs 0xFFFF0000 rt 16 0xFFFF precrq_rs ph w is the same but rounds and Q15 saturates both half re...

Страница 100: ...tic because the vacated high bits of the value are replaced by copies of the input bit 16 the sign bit thus performing a cor rect division by a power of two of a signed number As usual the shra_v vari...

Страница 101: ...The MIPS32 DSP ASE 101 Programming the MIPS32 74K Core Family Revision 02 14...

Страница 102: ...JTAG pins already included in every SoC for chip test24 So the debug unit requires Physical communications with some kind of probe device which is itself controlled by the debug host achieved through...

Страница 103: ...normal interrupts The address map changes in debug mode to give you access to the dseg region described below Quite a lot of exceptions just won t happen in debug mode those which do run peculiarly s...

Страница 104: ...fter entering debug mode but it probably did that To return from a nested debug exception like this you don t use deret which would inappropriately take you out of debug mode you grab the address out...

Страница 105: ...1100 0xFF30 1108 IBM10 0xFF30 1108 0xFF30 1110 IBASID0 0xFF30 1110 0xFF30 1118 IBC0 0xFF30 1118 I breakpoint 1 regs 0xFF30 1200 IBA1 0xFF30 1200 0xFF30 1208 IBM1 0xFF30 1208 0xFF30 1210 IBASID21 0xFF...

Страница 106: ...to debug a system which has no physical memory reserved for debug TCB Registers These are the PDtrace EJTag Registers They are physically located in the PDtrace unit and managed by the PDtrace unit F...

Страница 107: ...t to choose On some other implementations it s read only and just tells you what the CPU does IEXI set to 1 to defer imprecise exceptions Set by default on entry to debug mode cleared on exit but writ...

Страница 108: ...e but which have not happened yet because they are imprecise and Debug IEXI is set They remain set until Debug IEXI is cleared explicitly or implicitly by a deret when the exception is delivered and t...

Страница 109: ...he PC of instructions that missed in the instruction cache See Section 8 1 14 PC Sampling with EJTAG for details DAS DASQ DASE DAS reads 1 if the Data Address Sampling feature is available If supporte...

Страница 110: ...are costs for no real loss in functionality ISA In cores with the microMIPS ISA this bit can specify which ISA the exception handler is built in This is tied to 0 on this core as the MIPS16 ASE does n...

Страница 111: ...indicates what type of entity is associated with this TAP and if the TypeInfo field is used TypeInfo identifier information specific to the entity associated with this TAP Rocc reset occurred reads 1...

Страница 112: ...eset signal which is more reliable ProbEn ProbTrap EjtagBrk ProbEn must be set before CPU accesses to dmseg will be sent to the probe It can be written by the probe directly ProbTrap relocates the deb...

Страница 113: ...he FDC registers within the device block Each device within the CDMM begins with an Access Control and Status Register which gives information about the device and also provides a means for giving use...

Страница 114: ...earlier to avoid wasting transfers of null transmit data or non accepted receive data or minimum latency to be interrupted as soon as data is available This register is shown in Figure 8 10 Figure 8 1...

Страница 115: ...ID and written into the FIFO with the data Results are undefined if FDSTAT TxF 1 so that register should be checked prior to writing data Figure 8 13 Fields in the FDC Transmit FDTXn Registers 8 1 11...

Страница 116: ...ns and allows you to determine whether an EJTAG I breakpoint may apply only in MIPS16 or non MIPS16 mode IBASIDn DBASIDn specifies an 8 bit ASID which may be compared against the current EntryHi ASID...

Страница 117: ...nore its value Set this field all ones to disable the data match TE set 1 to use as trigger for PDtrace instruction tracing BE set 1 to activate breakpoint This fields resets to zero to avoid spurious...

Страница 118: ...and data breakpoints filtering only on address conditions are precise that means that 1 DEPC will point at the fetched or load store instruction itself except if it s in a branch delay slot will poin...

Страница 119: ...sure to read it back and see if the write stuck so that you know how many bits to scan and how to interpret them EJTAG revision 5 0 adds a new optional mechanism for triggering PC sampling when an in...

Страница 120: ...are that comes up after a hard or soft reset to know the last known good value of TCBRDP before system crash and potentially read the trace mem ory from or to the appropriate trace memory location 0x3...

Страница 121: ...od probes have generous amounts of high speed memory to store long traces TraceControl2 ValidModes TBI TBU described below at Figure 7 10 and following tell you whether you have such a connection avai...

Страница 122: ...trace format is five bits to sup port 32 outstanding load and stores The outstanding loads and stores is with respect to the PDtrace unit not the Load Store unit Figure 8 16 Fields in the TCBCONTROLE...

Страница 123: ...dual EJTAG breakpoint trace triggers take effect Figure 8 18 Fields in the TraceControl Register Figure 8 19 Fields in the TraceControl2 Register Figure 8 20 Fields in the TraceControl3 register TS se...

Страница 124: ...lly including the miss address TIM switch on to trace all I cache misses On master trace on off switch set 0 to do no tracing at all The read only fields in TraceControl2 provide information about the...

Страница 125: ...ied and if the trace unit is idle then it is safe to change the trace control settings After changing the settings trace can be turned back on and tracing resumes cleanly with the new control The rest...

Страница 126: ...s this register CP0 access rules apply when writing to this user register 8 2 5 Summary of when trace happens The many different enable bits which control trace add up to or strictly and up to a whole...

Страница 127: ...d of on trigger and if this trigger is conditional on arm there must have been an arm event since system reset or any disarm event or the trigger unconditionally turns trace on And since the on trigge...

Страница 128: ...control fields 8 3 1 The WatchLo0 3 registers Used in conjunction with WatchHi0 3 respectively each of these registers carries the virtual address and what to match fields for a CP0 watchpoint Figure...

Страница 129: ...out is shown in Figure 8 24 Figure 8 24 Fields in the PerfCtl0 3 Register There are usually four counters but software should check using the PerfCtl M bit which indicates at least one more Then the f...

Страница 130: ...he I cache and fetch four instructions at once so you only get one cache fetch for that group of four instructions But even then an unconditional branch which is not at the end of a group of four inst...

Страница 131: ...uction buffer is full Number of valid fetch slots killed in the IFU due to branches jumps or other stalling instructions 10 Reserved Reserved 11 Reserved Reserved 12 Reserved 13 Cycles when no instruc...

Страница 132: ...74K core s D cache has an auxiliary virtual tag used to help pick the right line early When occa sionally the physical tag check shows some mis match it is treated as a cache miss in processing the m...

Страница 133: ...ed Includes Floating Point Loads 54 Cycles where one instruction graduated Cycles where two instructions graduated 55 GFifo blocked cycles Floating point stores graduated 56 Number of cycles 0 instruc...

Страница 134: ...8 4 Performance counters Programming the MIPS32 74K Core Family Revision 02 14 134...

Страница 135: ...ic architectural documents MIPS32 The MIPS32 architecture definition series in three volumes MIPS32V1 Introduction to the MIPS32 Architecture MIPS Technologies document MD00080 MIPS32V2 The MIPS32 Ins...

Страница 136: ...tion to the MIPS architecture updated in 2006 to reflect the current version of MIPS32 MIPSPROG MIPS Programmers Handbook Erin Farquar Philip Bunce Morgan Kaufmann ISBN 1 55860 297 6 Restricted to the...

Страница 137: ...ers Unused fields in registers are marked either with a digit 0 or an X A field marked zero should always be written with zero and subject to that is guaranteed to read zero on cores in the 74K family...

Страница 138: ...Free running counter at pipeline or sub multiple speed B 1 5 p 145 10 0 EntryHi High order portion of the TLB entry 3 12 p 49 11 0 Compare Timer interrupt control B 1 5 p 145 12 0 Status Processor sta...

Страница 139: ...s 3 4 17 p 42 27 0 CacheErr Cache parity exception status 3 4 16 p 41 28 0 ITagLo Read write interface for load store tag cacheops but when used for scratchpad RAM configuration see Section 3 8 p 45 3...

Страница 140: ...1 Debug 23 0 EPC 14 0 EntryHi 10 0 PDtrace TraceControl 23 1 Timer Compare 11 0 EntryLo0 1 2 0 3 0 TraceControl2 23 2 Count 9 0 Index 0 0 TraceControl3 24 2 CPU Configuration Config 16 0 PageMask 5 0...

Страница 141: ...y to allow often MX is set to 1 to enable instructions in either the MIPS DSP extension to the MIPS architecture or the MDMX extension The two may not be used together and MDMX is unlikely to ever be...

Страница 142: ...rupt bits programmable at Cause IP1 0 Status UM SM execution privilege level basically user or kernel The intermediate supervisor privilege level is rarely used but that s why this is a 2 bit field Re...

Страница 143: ...aused the exception perhaps to emulate it Cause TI last interrupt was from the on core timer see section below for Count Compare Cause CE if that was a co processor unusable exception this is the co p...

Страница 144: ...ligned or a privilege viola tion 5 AdES 6 IBE Bus error signaled on instruction fetch 7 DBE Bus error signaled on load store imprecise 8 Sys System call ie syscall instruction executed 9 Bp Breakpoint...

Страница 145: ...s handy For a periodic interrupt simply advance Compare by a fixed amount each time and check for the possibility that Count has overrun it To set a timer for some point in the future just set Compare...

Страница 146: ...ics of ehb found on older CPUs By default ehb will check whether any instructions in flight are directly writing CP0 registers if such instructions exist it will block issue of instructions from the i...

Страница 147: ...Section 3 4 9 Cache aliases All the remaining fields are read write and control various functions Only one of them is likely to find real system use Config7 PREF defaults to 2 b01 These two bits contr...

Страница 148: ...s non blocking loads Normally the 74K core will keep running after a load instruction even if it misses in the D cache until the data is used With this disable bit set the CPU will stall on any load D...

Страница 149: ...provides a very fast way of predicting whether there s a cache hit and if so which way of the cache will contain the right data But the virtual tag check is heuristic in some cases it will turn out o...

Страница 150: ...data instruction Which word of the cache line is transferred depends on the low address fed to the cache instruction D cache load stores transfer one word in DDataLo but I cache load stores transfer t...

Страница 151: ...set gains some useful extra features shown below User level pro grams also get limited access to hardware registers useful for user privilege software but which wants to adapt portably to get the best...

Страница 152: ...b execution hazards side effects of old instructions which affect how an instruction executes but excluding those which affect the instruction fetch process jalr hb jr hb hazards of all kinds Note tha...

Страница 153: ...ue such as a thread ID or a pointer to thread specific storage to the underlying Cop0 register and user mode programs can read it via rdhwr C 3 FPU changes in Release 2 of the MIPS32 Architecture The...

Страница 154: ...C 3 FPU changes in Release 2 of the MIPS32 Architecture Programming the MIPS32 74K Core Family Revision 02 14 154...

Страница 155: ...etc Miscellaneous fixes Change bars are vs 2 00 2 11 15th December 2007 For 2 11 release of the 74K core Changes include Update the number of pipeline stages Include Instruction Cache prefetch option...

Страница 156: ...re Family Revision 02 14 156 2 14 March 30 2011 Add Type and TypeInfo fields in implementation register Add Cache miss PC Sampling feature Revision Date Description Copyright Wave Computing Inc All ri...

Отзывы: