![Freescale Semiconductor PowerQUICC III Скачать руководство пользователя страница 10](http://html1.mh-extra.com/html/freescale-semiconductor/powerquicc-iii/powerquicc-iii_application-note_2330562010.webp)
PowerQUICC III Performance Monitors, Rev. 2
10
Freescale Semiconductor
Examples
Figure 2. Cache Example
The code used for this example is compiled with -O2 optimizations. As shown in
Figure 2
, it consists of a
constant number of instructions executed, 0x0147_3F03 instructions. Based on the total time it takes to
execute the application, a comparison can easily be made between caches disabled (at the far right), only
L1 caches enabled, L1 and L2 caches enabled, and L1, L2 caches, and Branch Prediction Unit (BPU)
enabled.
Due to the limited number of core performance counters, it is necessary to run through the application
twice (for each scenario, for a total of 8 times) to collect all data represented in
Figure 2
.
Through analysis such as this, it is possible to tune L2 cache usage as data-only cache, instruction-only
cache, or unified cache.
6.2
Example: Branch Prediction
As seen in
Figure 2
, enabling branch prediction can significantly increase performance. In this application,
it brought down the total number of cycles and increased IPC. It is possible to collect more information
about the BTB.
Figure 3. BPU Miss Rate
This example uses the same code as the example in
Section 6.1, “Example: Cache Performance.”
L1 and
L2 caches are enabled, as is the BPU. By collecting data on branches finished and branch hits, it is possible
to calculate a branch miss rate for this particular application.
6.3
Example: DDR Performance
It may be desirable to determine the performance of the DDR controller and possibly optimize parameters.
This example illustrates the impact of tweaking the BSTOPRE field.
L2 & BPU enabled
L2 enabled
L2 D isabled
C aches Disabled
Instructions Com pleted
C e:Ref:2
1473f03
1473f03
1473f03
1473f03
C ore Cycles
C e:Ref:1
11e73d8
1425a5e
145d091
c236a2d
Instruction L1 cache reloads
C E:C om :60
3f
4aa
4aa
0
D ata L1 cache reloads
C E:C om :41
351a
3523
3522
0
Loads Com pleted
C E:C om :9
2f5588
2f5588
2f5588
0
Stores C om pleted
C E:C om :10
17a750
17a750
17a750
0
Instr Accesses to L2 that hit
SE:ref:22 (0x1
4
440
0
0
Instr to L2 that m iss
SE:2:59 (0x7b
3f
3d
0
0
D ata Accesses to L2 that hit
SE:ref:23 (0x1
2ed1
2ed3
0
0
D ata to L2 that m iss
SE:4:57 (0x79
3732
372d
0
0
L1 Instruction M iss R atio
2.93756E-06
5.56737E-05
5.56737E-05
0
L1 D ata M iss R atio
0.0029
0.0029
0.0029
#D IV/0!
L2 M iss R atio
0.542089985
0.520377095
#D IV/0!
#DIV/0!
T otal M iss R atio
0.001584798
0.001536049
#D IV/0!
#DIV/0!
T otal Hit Ratio
99.8415%
99.8464%
#D IV/0!
#DIV/0!
C ore C ycles (decim al)
18,772,952
21,125,726
21,352,593
203,647,533
IPC
1.1424E+00
1.0152E+00
1.0044E+00
1.0531E-01
Instructions Completed
Ce:Ref:2
1473f09
Branches finished
Ce:Com:12
73cd6
Branch Hits
Ce:COM:17
73cbf
Branch Miss Rate
0.0048%