Freescale Semiconductor PowerQUICC III Скачать руководство пользователя страница 13

PowerQUICC III Performance Monitors, Rev. 2

Freescale Semiconductor

13

 

Data Presentation

7

Data Presentation

The presentation of the data obtained is an important step in the performance evaluation of a system. 
Graphical charts, such as Kiviat charts (known as radar plots in Excel) and Gantt charts, aid in the 
understanding of performance evaluation results. Charts such as these enable readers to quickly grasp 
details and compare the performance of one system over another.

Kiviat graphs are visual devices that allow for quick identification of performance problems. Typically, an 
even number of metrics is used. Half of the metrics should be higher-bound metrics, meaning a higher 
value of the metric is considered better and the other half should be lower-bound metrics, where a lower 
value is considered better. These higher-bound and lower-bound metrics are plotted along alternate radial 
lines in the Kiviat graph. 

For example, the following might be plotted:

CPU efficiency

 (higher bounds) Since the e500 is capable of completing 2 instructions per cycle 

(plus one branch), this metric would be equivalent of 2 minus IPC.

Cycles read/write to DDR

 (lower bounds)

Cache hit ratio

 (higher bounds)

Branch miss rate

 (lower bounds)

Overall DDR page hit rate

 (higher bounds)

Cycles reading LBC SDRAM

 (higher bounds)

Packets per second TSEC1

 (lower bounds)

L2 non-core miss rate

 (higher bounds)

Содержание PowerQUICC III

Страница 1: ...ce manual The e500 core level performance monitors enable the counting of e500 specific events for example cache misses mispredicted branches or the number of cycles an execution unit stalls These are configured by a set of special purpose registers that can only be written through supervisor level accesses The core level event counters are also available through a read only set of user level regi...

Страница 2: ...or level registers These consist of the same four counters UPMC0 UPMC3 associated local control registers UPMLCa0 UPMLCa3 and global control register UPMGC0 Additionally the core performance monitor may use the external core input pm_event as well as the performance monitor mark bit in the MSR MSR PMM to control which processes are monitored 2 1 Counter Events Counter events are listed in the Powe...

Страница 3: ...ance analysis and characterization These include Instructions per cycle IPC Instructions per packet IPP Packets per second PPS Branch misses per total branches Branches per 1000 instructions L1 instruction cache miss rate L1 data cache miss rate L2 cache core miss rate L2 cache non core miss rate Memory system page hit ratio Note that because these calculations make use of both the core events and...

Страница 4: ...1 Instructions per packet IPP instructions completed accepted frames on TSEC1 SE Ref 36 CE Ref 2 CE Ref 2 SE Ref 36 Packets per second PPS accepted frames on TSEC1 Time SE Ref 36 CE Ref 1 SE Ref 36 CE Ref 1 Processor Frequency Branch miss ratio branches mispredicted branches finished CE Com 12 CE Com 17 CE Com 12 CE Com17 CE Com 12 Branches per 1000 instructions 1000 branches finished kilo instruc...

Страница 5: ... int CCSB 0xE1060 0x00170000 PMLCa5 Start Global Control Register unsigned int unsigned int CCSB 0xE1000 0x00000000 PMGC0 The above code shows a sequence for initializing counters PMC2 PMC5 to zero then setting up the local control registers to count the events required for the metric previously mentioned The global control register is then set to 0x0 which will start the counting Note that becaus...

Страница 6: ...han there are PMCs For example Table 2 lists all the events necessary to calculate the full list of metrics from Table 1 Table 2 Events Necessary for Data Collection of Common Metrics Core Event System Event CE Ref 2 SE C0 CE Com 12 SE Ref 36 CE Com 17 SE Ref 22 CE Com 68 SE Ref 23 CE Com 9 SE Ref 24 CE Com 10 SE C1 54 CE Com 41 SE C2 59 SE C4 57 SE C2 SE C4 SE C6 SE C8 ...

Страница 7: ...ication It is important to understand the system impact of turning on the performance counters to ensure that they do not have adverse affects on the system 5 1 Core Clock Cycles There are two available methods for obtaining the core clock cycles They may be measured directly using the core event CE Ref 1 or calculated using the system event SE C0 CCB clock cycles Multiplying the number of CCB clo...

Страница 8: ... chaining is carried out by configuring PMLCa0 EVENT 83 PMLCa1 EVENT 2 In this manner a 64 bit counter for CE Ref 2 is created The total number of instructions completed can be interpreted as Eqn 1 5 3 Burstiness The system performance monitor counters include a burstiness counting feature to aid in characterizing events that occur in rapid succession followed by a relatively long pause Event burs...

Страница 9: ... The number of CCB cycles may be used as a counter in order to periodically sample data every x CCB cycles x 1 MHz seconds 5 4 3 Debugger The CCB platform counter SE C0 will continue to increment even if the core is halted by a debugger Carefully consider the implications of this when sampling counters during debugging 6 Examples The metrics listed in Section 4 Performance Metrics are generic appl...

Страница 10: ...s is the BPU By collecting data on branches finished and branch hits it is possible to calculate a branch miss rate for this particular application 6 3 Example DDR Performance It may be desirable to determine the performance of the DDR controller and possibly optimize parameters This example illustrates the impact of tweaking the BSTOPRE field L2 BPU enabled L2 enabled L2 Disabled Caches Disabled ...

Страница 11: ...e 4 or 224 core clock cycles 6 4 I O and Compute Bound Systems Analysis of the performance monitors may be useful in determining if a system is I O or compute bound I O bound may imply I O to the core and force a stall while waiting for data However interfaces may also become saturated without core involvement The MPC8560 for example may act as a RapidIO to PCI X bridge without core intervention I...

Страница 12: ...centage of cycles spent reading writing DDR or the LBC is so low 1 03 the system in this scenario is compute bound To ensure accuracy of these claims cycles writing to LBC SDRAM counter SE C6 55 may be added to the metrics sampled However this would require two samples per run since both this metric and ECM dispatches to LBC are counter specific to C6 The code executed in these examples is previou...

Страница 13: ... number of metrics is used Half of the metrics should be higher bound metrics meaning a higher value of the metric is considered better and the other half should be lower bound metrics where a lower value is considered better These higher bound and lower bound metrics are plotted along alternate radial lines in the Kiviat graph For example the following might be plotted CPU efficiency higher bound...

Страница 14: ... shows a Kiviat graph for an unbalanced system Figure 8 Kiviat Graph for Unbalanced System 0 000 20 000 40 000 60 000 80 000 100 000 CPU Efficiency Cycles Read Wr DDR Cache Hit Ratio Branch Miss Rate Overall DDR Page Hit Rate Cycles Reading LBC SDRAM Packets Per Second TSEC1 L2 Non Core Miss Rate ...

Страница 15: ...r the LBC making it easy to identify that problem Figure 9 is an example correlating to the corrected system with cache enabled for the LBC This system appears balanced as its plot looks more star shaped 8 Revision History Table 3 provides a revision history for this application note Table 3 Document Revision History Rev Number Date Substantive Change s 2 03 2014 Added new Figure 4 1 08 06 2008 In...

Страница 16: ... may be provided in Freescale data sheets and or specifications can and do vary in different applications and actual performance may vary over time All operating parameters including typicals must be validated for each customer application by customer s technical experts Freescale does not convey any license under its patent rights nor the rights of others Freescale sells products pursuant to stan...

Отзывы: