Freescale Semiconductor PowerQUICC III Скачать руководство пользователя страница 2

PowerQUICC III Performance Monitors, Rev. 2

2

Freescale Semiconductor

 

e500 Core Performance Monitors

performance monitors are similar in many respects to the performance monitors implemented on the e500 
core. However, they are capable of counting events only outside the e500 core, for example, PCI, DDR, 
and L2 cache events. Device-level performance monitors are memory-mapped, allowing user space 
configuration accesses. 

Together, these two sets of performance monitor registers can be used by the developer to improve system 
performance, characterize and benchmark processors, and help debug their systems.

2

e500 Core Performance Monitors

The e500 core performance monitors are described in detail in Chapter 7 of the 

Power PC e500 Core 

Family Reference Manual

Performance monitor registers are grouped into supervisor-level registers, accessed with 

mtpmr

 and 

mfpmr

, and user-level performance monitor registers, which are read-only and accessed with the 

mfpmr

 

instruction. The supervisor-level registers consist of the four performance monitor counters 
(PMC0-PMC3), each used to count up to 128 events; associated performance monitor local control 
registers (PMLCa0-PMLCa3); and the performance monitor global control register. The user mode 
registers are read-only copies of the supervisor-level registers. These consist of the same four counters 
(UPMC0-UPMC3), associated local control registers (UPMLCa0-UPMLCa3), and global control register 
(UPMGC0).

Additionally, the core performance monitor may use the external core input, 

pm_event,

 as well as the 

performance monitor mark bit in the MSR (MSR[PMM]) to control which processes are monitored. 

2.1

Counter Events

Counter events are listed in the 

Power PC e500 Core Family Reference Manual

. These are subdivided into 

three groups:

Reference (Ref:#)

 - Possible to count these events on any of the four counters (PMC0-PMC3). 

These events are applicable to most Power Architecture

®

 microprocessors.

Common (Com:#)

 - Possible to count these events on any of the four counters (PMC0-PMC3). 

These events are specific to the e500 microarchitecture.

Counter-Specific (C[0-3]:#)

 - Can only be counted on the specific counter noted. For example, an 

event assigned to counter PMC2 is shown as C2:#

3

Device Performance Monitors

The device performance monitors are described in detail in the corresponding product reference manual. 
These performance monitor counters operate separately from the core performance monitors and are 
intended to monitor and record device-level events. 

The device performance monitor consists of ten counters (PMC0-PMC9), capable of monitoring 576 
events, as well as the associated local control registers (PMLCA0-PLMCA9) and the global control 
register (PMGC0). These registers are all memory-mapped and can be accessed in supervisor or user 
mode.

Содержание PowerQUICC III

Страница 1: ...ce manual The e500 core level performance monitors enable the counting of e500 specific events for example cache misses mispredicted branches or the number of cycles an execution unit stalls These are configured by a set of special purpose registers that can only be written through supervisor level accesses The core level event counters are also available through a read only set of user level regi...

Страница 2: ...or level registers These consist of the same four counters UPMC0 UPMC3 associated local control registers UPMLCa0 UPMLCa3 and global control register UPMGC0 Additionally the core performance monitor may use the external core input pm_event as well as the performance monitor mark bit in the MSR MSR PMM to control which processes are monitored 2 1 Counter Events Counter events are listed in the Powe...

Страница 3: ...ance analysis and characterization These include Instructions per cycle IPC Instructions per packet IPP Packets per second PPS Branch misses per total branches Branches per 1000 instructions L1 instruction cache miss rate L1 data cache miss rate L2 cache core miss rate L2 cache non core miss rate Memory system page hit ratio Note that because these calculations make use of both the core events and...

Страница 4: ...1 Instructions per packet IPP instructions completed accepted frames on TSEC1 SE Ref 36 CE Ref 2 CE Ref 2 SE Ref 36 Packets per second PPS accepted frames on TSEC1 Time SE Ref 36 CE Ref 1 SE Ref 36 CE Ref 1 Processor Frequency Branch miss ratio branches mispredicted branches finished CE Com 12 CE Com 17 CE Com 12 CE Com17 CE Com 12 Branches per 1000 instructions 1000 branches finished kilo instruc...

Страница 5: ... int CCSB 0xE1060 0x00170000 PMLCa5 Start Global Control Register unsigned int unsigned int CCSB 0xE1000 0x00000000 PMGC0 The above code shows a sequence for initializing counters PMC2 PMC5 to zero then setting up the local control registers to count the events required for the metric previously mentioned The global control register is then set to 0x0 which will start the counting Note that becaus...

Страница 6: ...han there are PMCs For example Table 2 lists all the events necessary to calculate the full list of metrics from Table 1 Table 2 Events Necessary for Data Collection of Common Metrics Core Event System Event CE Ref 2 SE C0 CE Com 12 SE Ref 36 CE Com 17 SE Ref 22 CE Com 68 SE Ref 23 CE Com 9 SE Ref 24 CE Com 10 SE C1 54 CE Com 41 SE C2 59 SE C4 57 SE C2 SE C4 SE C6 SE C8 ...

Страница 7: ...ication It is important to understand the system impact of turning on the performance counters to ensure that they do not have adverse affects on the system 5 1 Core Clock Cycles There are two available methods for obtaining the core clock cycles They may be measured directly using the core event CE Ref 1 or calculated using the system event SE C0 CCB clock cycles Multiplying the number of CCB clo...

Страница 8: ... chaining is carried out by configuring PMLCa0 EVENT 83 PMLCa1 EVENT 2 In this manner a 64 bit counter for CE Ref 2 is created The total number of instructions completed can be interpreted as Eqn 1 5 3 Burstiness The system performance monitor counters include a burstiness counting feature to aid in characterizing events that occur in rapid succession followed by a relatively long pause Event burs...

Страница 9: ... The number of CCB cycles may be used as a counter in order to periodically sample data every x CCB cycles x 1 MHz seconds 5 4 3 Debugger The CCB platform counter SE C0 will continue to increment even if the core is halted by a debugger Carefully consider the implications of this when sampling counters during debugging 6 Examples The metrics listed in Section 4 Performance Metrics are generic appl...

Страница 10: ...s is the BPU By collecting data on branches finished and branch hits it is possible to calculate a branch miss rate for this particular application 6 3 Example DDR Performance It may be desirable to determine the performance of the DDR controller and possibly optimize parameters This example illustrates the impact of tweaking the BSTOPRE field L2 BPU enabled L2 enabled L2 Disabled Caches Disabled ...

Страница 11: ...e 4 or 224 core clock cycles 6 4 I O and Compute Bound Systems Analysis of the performance monitors may be useful in determining if a system is I O or compute bound I O bound may imply I O to the core and force a stall while waiting for data However interfaces may also become saturated without core involvement The MPC8560 for example may act as a RapidIO to PCI X bridge without core intervention I...

Страница 12: ...centage of cycles spent reading writing DDR or the LBC is so low 1 03 the system in this scenario is compute bound To ensure accuracy of these claims cycles writing to LBC SDRAM counter SE C6 55 may be added to the metrics sampled However this would require two samples per run since both this metric and ECM dispatches to LBC are counter specific to C6 The code executed in these examples is previou...

Страница 13: ... number of metrics is used Half of the metrics should be higher bound metrics meaning a higher value of the metric is considered better and the other half should be lower bound metrics where a lower value is considered better These higher bound and lower bound metrics are plotted along alternate radial lines in the Kiviat graph For example the following might be plotted CPU efficiency higher bound...

Страница 14: ... shows a Kiviat graph for an unbalanced system Figure 8 Kiviat Graph for Unbalanced System 0 000 20 000 40 000 60 000 80 000 100 000 CPU Efficiency Cycles Read Wr DDR Cache Hit Ratio Branch Miss Rate Overall DDR Page Hit Rate Cycles Reading LBC SDRAM Packets Per Second TSEC1 L2 Non Core Miss Rate ...

Страница 15: ...r the LBC making it easy to identify that problem Figure 9 is an example correlating to the corrected system with cache enabled for the LBC This system appears balanced as its plot looks more star shaped 8 Revision History Table 3 provides a revision history for this application note Table 3 Document Revision History Rev Number Date Substantive Change s 2 03 2014 Added new Figure 4 1 08 06 2008 In...

Страница 16: ... may be provided in Freescale data sheets and or specifications can and do vary in different applications and actual performance may vary over time All operating parameters including typicals must be validated for each customer application by customer s technical experts Freescale does not convey any license under its patent rights nor the rights of others Freescale sells products pursuant to stan...

Отзывы: