background image

Freescale Semiconductor

Application Note

© 2008-2014 Freescale Semiconductor, Inc. All rights reserved.

 

This application note describes aspects of utilizing the core 
and device-level performance monitors on PowerQUICC

 

III 

(PQ3). Included are example calculations to aid in 
interpreting data collected. 

1

Performance Monitors

PowerQUICC

 

III processors are the first family of 

PowerQUICC processors to include performance monitors 
on-chip. These include both core performance monitors, 
described in detail in the 

Power PC

® 

e500 Core Family 

Reference Manual

, as well as device-level performance 

monitors, described in detail in the product-specific 
reference manual.

The e500 core level performance monitors enable the 
counting of e500-specific events, for example, cache misses, 
mispredicted branches, or the number of cycles an execution 
unit stalls. These are configured by a set of special purpose 
registers that can only be written through supervisor-level 
accesses. The core-level event counters are also available 
through a read-only set of user-level registers.

The device-level performance monitors can be used to 
monitor and record selected events on a device level. These 

Document Number: AN3636

Rev. 2, 03/2014

Contents

1. Performance Monitors   . . . . . . . . . . . . . . . . . . . . . . . .  1
2. e500 Core Performance Monitors . . . . . . . . . . . . . . . .  2
3. Device Performance Monitors   . . . . . . . . . . . . . . . . . .  2
4. Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . .  3
5. Data Collection  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5
6. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  9
7. Data Presentation   . . . . . . . . . . . . . . . . . . . . . . . . . . .  13
8. Revision History  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  15

PowerQUICC III Performance 
Monitors

Using the Core and System Performance Monitors

Summary of Contents for PowerQUICC III

Page 1: ...ce manual The e500 core level performance monitors enable the counting of e500 specific events for example cache misses mispredicted branches or the number of cycles an execution unit stalls These are configured by a set of special purpose registers that can only be written through supervisor level accesses The core level event counters are also available through a read only set of user level regi...

Page 2: ...or level registers These consist of the same four counters UPMC0 UPMC3 associated local control registers UPMLCa0 UPMLCa3 and global control register UPMGC0 Additionally the core performance monitor may use the external core input pm_event as well as the performance monitor mark bit in the MSR MSR PMM to control which processes are monitored 2 1 Counter Events Counter events are listed in the Powe...

Page 3: ...ance analysis and characterization These include Instructions per cycle IPC Instructions per packet IPP Packets per second PPS Branch misses per total branches Branches per 1000 instructions L1 instruction cache miss rate L1 data cache miss rate L2 cache core miss rate L2 cache non core miss rate Memory system page hit ratio Note that because these calculations make use of both the core events and...

Page 4: ...1 Instructions per packet IPP instructions completed accepted frames on TSEC1 SE Ref 36 CE Ref 2 CE Ref 2 SE Ref 36 Packets per second PPS accepted frames on TSEC1 Time SE Ref 36 CE Ref 1 SE Ref 36 CE Ref 1 Processor Frequency Branch miss ratio branches mispredicted branches finished CE Com 12 CE Com 17 CE Com 12 CE Com17 CE Com 12 Branches per 1000 instructions 1000 branches finished kilo instruc...

Page 5: ... int CCSB 0xE1060 0x00170000 PMLCa5 Start Global Control Register unsigned int unsigned int CCSB 0xE1000 0x00000000 PMGC0 The above code shows a sequence for initializing counters PMC2 PMC5 to zero then setting up the local control registers to count the events required for the metric previously mentioned The global control register is then set to 0x0 which will start the counting Note that becaus...

Page 6: ...han there are PMCs For example Table 2 lists all the events necessary to calculate the full list of metrics from Table 1 Table 2 Events Necessary for Data Collection of Common Metrics Core Event System Event CE Ref 2 SE C0 CE Com 12 SE Ref 36 CE Com 17 SE Ref 22 CE Com 68 SE Ref 23 CE Com 9 SE Ref 24 CE Com 10 SE C1 54 CE Com 41 SE C2 59 SE C4 57 SE C2 SE C4 SE C6 SE C8 ...

Page 7: ...ication It is important to understand the system impact of turning on the performance counters to ensure that they do not have adverse affects on the system 5 1 Core Clock Cycles There are two available methods for obtaining the core clock cycles They may be measured directly using the core event CE Ref 1 or calculated using the system event SE C0 CCB clock cycles Multiplying the number of CCB clo...

Page 8: ... chaining is carried out by configuring PMLCa0 EVENT 83 PMLCa1 EVENT 2 In this manner a 64 bit counter for CE Ref 2 is created The total number of instructions completed can be interpreted as Eqn 1 5 3 Burstiness The system performance monitor counters include a burstiness counting feature to aid in characterizing events that occur in rapid succession followed by a relatively long pause Event burs...

Page 9: ... The number of CCB cycles may be used as a counter in order to periodically sample data every x CCB cycles x 1 MHz seconds 5 4 3 Debugger The CCB platform counter SE C0 will continue to increment even if the core is halted by a debugger Carefully consider the implications of this when sampling counters during debugging 6 Examples The metrics listed in Section 4 Performance Metrics are generic appl...

Page 10: ...s is the BPU By collecting data on branches finished and branch hits it is possible to calculate a branch miss rate for this particular application 6 3 Example DDR Performance It may be desirable to determine the performance of the DDR controller and possibly optimize parameters This example illustrates the impact of tweaking the BSTOPRE field L2 BPU enabled L2 enabled L2 Disabled Caches Disabled ...

Page 11: ...e 4 or 224 core clock cycles 6 4 I O and Compute Bound Systems Analysis of the performance monitors may be useful in determining if a system is I O or compute bound I O bound may imply I O to the core and force a stall while waiting for data However interfaces may also become saturated without core involvement The MPC8560 for example may act as a RapidIO to PCI X bridge without core intervention I...

Page 12: ...centage of cycles spent reading writing DDR or the LBC is so low 1 03 the system in this scenario is compute bound To ensure accuracy of these claims cycles writing to LBC SDRAM counter SE C6 55 may be added to the metrics sampled However this would require two samples per run since both this metric and ECM dispatches to LBC are counter specific to C6 The code executed in these examples is previou...

Page 13: ... number of metrics is used Half of the metrics should be higher bound metrics meaning a higher value of the metric is considered better and the other half should be lower bound metrics where a lower value is considered better These higher bound and lower bound metrics are plotted along alternate radial lines in the Kiviat graph For example the following might be plotted CPU efficiency higher bound...

Page 14: ... shows a Kiviat graph for an unbalanced system Figure 8 Kiviat Graph for Unbalanced System 0 000 20 000 40 000 60 000 80 000 100 000 CPU Efficiency Cycles Read Wr DDR Cache Hit Ratio Branch Miss Rate Overall DDR Page Hit Rate Cycles Reading LBC SDRAM Packets Per Second TSEC1 L2 Non Core Miss Rate ...

Page 15: ...r the LBC making it easy to identify that problem Figure 9 is an example correlating to the corrected system with cache enabled for the LBC This system appears balanced as its plot looks more star shaped 8 Revision History Table 3 provides a revision history for this application note Table 3 Document Revision History Rev Number Date Substantive Change s 2 03 2014 Added new Figure 4 1 08 06 2008 In...

Page 16: ... may be provided in Freescale data sheets and or specifications can and do vary in different applications and actual performance may vary over time All operating parameters including typicals must be validated for each customer application by customer s technical experts Freescale does not convey any license under its patent rights nor the rights of others Freescale sells products pursuant to stan...

Reviews: