Freescale Semiconductor PowerQUICC III Скачать руководство пользователя страница 5

PowerQUICC III Performance Monitors, Rev. 2

Freescale Semiconductor

5

 

Data Collection

SE:Ref:22 - core instruction accesses to L2 that hit

SE:C2:59 - core instruction accesses to L2 that miss

SE:Ref:23 - core data accesses to L2 that hit

SE:C4:57 - core data accesses to L2 that miss

Note that these are all device-level performance monitor events that can all be run simultaneously. This 
example uses counters PMC2 - PMC5.

// Initialize Counters

*(unsigned int *) ((unsigned int) CCSB + 0xE1038) = 0x0 /*PMC2*/

*(unsigned int *) ((unsigned int) CCSB + 0xE1048) = 0x0 /*PMC3*/

*(unsigned int *) ((unsigned int) CCSB + 0xE1058) = 0x0 /*PMC4*/

*(unsigned int *) ((unsigned int) CCSB + 0xE1068) = 0x0 /*PMC5*/

// Initialize Global Control Register

*(unsigned int *) ((unsigned int) CCSB + 0xE1000) = 0x80000000 /*PMGC0*/

// Initialize Local Control Registers

*(unsigned int *) ((unsigned int) CCSB + 0xE1030) = 0x007B0000 /*PMLCa2*/

*(unsigned int *) ((unsigned int) CCSB + 0xE1040) = 0x00160000 /*PMLCa3*/

*(unsigned int *) ((unsigned int) CCSB + 0xE1050) = 0x00790000 /*PMLCa4*/

*(unsigned int *) ((unsigned int) CCSB + 0xE1060) = 0x00170000 /*PMLCa5*/

// Start Global Control Register

*(unsigned int *) ((unsigned int) CCSB + 0xE1000) = 0x00000000 /*PMGC0*/

The above code shows a sequence for initializing counters PMC2-PMC5 to zero, then setting up the local 
control registers to count the events required for the metric previously mentioned. The global control 
register is then set to 0x0, which will start the counting.

Note that because the events counted by C2 and C4 are counter-specific events, they are offset by 64.

When the software task is finished, the counters can be halted by the global control register, and results 
may be read from the relevant counters. 

5

Data Collection

The core performance monitor has four 32-bit PMCs for capturing core events. The system performance 
monitor has eight 32-bit PMCs for capturing system events and one 64-bit PMC exclusively dedicated for 
capturing the CCB clock cycles. Collectively, these counters allow the capture of four core events, eight 
system events, and the CCB clock cycles simultaneously. Collecting data from various events 
simultaneously makes the captured events almost perfectly correlated, as they are collected under the 

Содержание PowerQUICC III

Страница 1: ...ce manual The e500 core level performance monitors enable the counting of e500 specific events for example cache misses mispredicted branches or the number of cycles an execution unit stalls These are configured by a set of special purpose registers that can only be written through supervisor level accesses The core level event counters are also available through a read only set of user level regi...

Страница 2: ...or level registers These consist of the same four counters UPMC0 UPMC3 associated local control registers UPMLCa0 UPMLCa3 and global control register UPMGC0 Additionally the core performance monitor may use the external core input pm_event as well as the performance monitor mark bit in the MSR MSR PMM to control which processes are monitored 2 1 Counter Events Counter events are listed in the Powe...

Страница 3: ...ance analysis and characterization These include Instructions per cycle IPC Instructions per packet IPP Packets per second PPS Branch misses per total branches Branches per 1000 instructions L1 instruction cache miss rate L1 data cache miss rate L2 cache core miss rate L2 cache non core miss rate Memory system page hit ratio Note that because these calculations make use of both the core events and...

Страница 4: ...1 Instructions per packet IPP instructions completed accepted frames on TSEC1 SE Ref 36 CE Ref 2 CE Ref 2 SE Ref 36 Packets per second PPS accepted frames on TSEC1 Time SE Ref 36 CE Ref 1 SE Ref 36 CE Ref 1 Processor Frequency Branch miss ratio branches mispredicted branches finished CE Com 12 CE Com 17 CE Com 12 CE Com17 CE Com 12 Branches per 1000 instructions 1000 branches finished kilo instruc...

Страница 5: ... int CCSB 0xE1060 0x00170000 PMLCa5 Start Global Control Register unsigned int unsigned int CCSB 0xE1000 0x00000000 PMGC0 The above code shows a sequence for initializing counters PMC2 PMC5 to zero then setting up the local control registers to count the events required for the metric previously mentioned The global control register is then set to 0x0 which will start the counting Note that becaus...

Страница 6: ...han there are PMCs For example Table 2 lists all the events necessary to calculate the full list of metrics from Table 1 Table 2 Events Necessary for Data Collection of Common Metrics Core Event System Event CE Ref 2 SE C0 CE Com 12 SE Ref 36 CE Com 17 SE Ref 22 CE Com 68 SE Ref 23 CE Com 9 SE Ref 24 CE Com 10 SE C1 54 CE Com 41 SE C2 59 SE C4 57 SE C2 SE C4 SE C6 SE C8 ...

Страница 7: ...ication It is important to understand the system impact of turning on the performance counters to ensure that they do not have adverse affects on the system 5 1 Core Clock Cycles There are two available methods for obtaining the core clock cycles They may be measured directly using the core event CE Ref 1 or calculated using the system event SE C0 CCB clock cycles Multiplying the number of CCB clo...

Страница 8: ... chaining is carried out by configuring PMLCa0 EVENT 83 PMLCa1 EVENT 2 In this manner a 64 bit counter for CE Ref 2 is created The total number of instructions completed can be interpreted as Eqn 1 5 3 Burstiness The system performance monitor counters include a burstiness counting feature to aid in characterizing events that occur in rapid succession followed by a relatively long pause Event burs...

Страница 9: ... The number of CCB cycles may be used as a counter in order to periodically sample data every x CCB cycles x 1 MHz seconds 5 4 3 Debugger The CCB platform counter SE C0 will continue to increment even if the core is halted by a debugger Carefully consider the implications of this when sampling counters during debugging 6 Examples The metrics listed in Section 4 Performance Metrics are generic appl...

Страница 10: ...s is the BPU By collecting data on branches finished and branch hits it is possible to calculate a branch miss rate for this particular application 6 3 Example DDR Performance It may be desirable to determine the performance of the DDR controller and possibly optimize parameters This example illustrates the impact of tweaking the BSTOPRE field L2 BPU enabled L2 enabled L2 Disabled Caches Disabled ...

Страница 11: ...e 4 or 224 core clock cycles 6 4 I O and Compute Bound Systems Analysis of the performance monitors may be useful in determining if a system is I O or compute bound I O bound may imply I O to the core and force a stall while waiting for data However interfaces may also become saturated without core involvement The MPC8560 for example may act as a RapidIO to PCI X bridge without core intervention I...

Страница 12: ...centage of cycles spent reading writing DDR or the LBC is so low 1 03 the system in this scenario is compute bound To ensure accuracy of these claims cycles writing to LBC SDRAM counter SE C6 55 may be added to the metrics sampled However this would require two samples per run since both this metric and ECM dispatches to LBC are counter specific to C6 The code executed in these examples is previou...

Страница 13: ... number of metrics is used Half of the metrics should be higher bound metrics meaning a higher value of the metric is considered better and the other half should be lower bound metrics where a lower value is considered better These higher bound and lower bound metrics are plotted along alternate radial lines in the Kiviat graph For example the following might be plotted CPU efficiency higher bound...

Страница 14: ... shows a Kiviat graph for an unbalanced system Figure 8 Kiviat Graph for Unbalanced System 0 000 20 000 40 000 60 000 80 000 100 000 CPU Efficiency Cycles Read Wr DDR Cache Hit Ratio Branch Miss Rate Overall DDR Page Hit Rate Cycles Reading LBC SDRAM Packets Per Second TSEC1 L2 Non Core Miss Rate ...

Страница 15: ...r the LBC making it easy to identify that problem Figure 9 is an example correlating to the corrected system with cache enabled for the LBC This system appears balanced as its plot looks more star shaped 8 Revision History Table 3 provides a revision history for this application note Table 3 Document Revision History Rev Number Date Substantive Change s 2 03 2014 Added new Figure 4 1 08 06 2008 In...

Страница 16: ... may be provided in Freescale data sheets and or specifications can and do vary in different applications and actual performance may vary over time All operating parameters including typicals must be validated for each customer application by customer s technical experts Freescale does not convey any license under its patent rights nor the rights of others Freescale sells products pursuant to stan...

Отзывы: