![Intel XScale Core Developer'S Manual Download Page 115](http://html.mh-extra.com/html/intel/xscale-core/xscale-core_developers-manual_2073165115.webp)
Developer’s Manual
January, 2004
115
Intel XScale® Core
Developer’s Manual
Performance Monitoring
8.4.1
Instruction Cache Efficiency Mode
PMN0 totals the number of instructions that were executed, which does not include instructions
fetched from the instruction cache that were never executed. This can happen if a branch
instruction changes the program flow; the instruction cache may retrieve the next sequential
instructions after the branch, before it receives the target address of the branch.
PMN1 counts the number of instruction fetch requests to external memory. Each of these requests
loads 32 bytes at a time.
Statistics derived from these two events:
•
Instruction cache miss-rate. This is derived by dividing PMN1 by PMN0.
•
The average number of cycles it took to execute an instruction or commonly referred to as
cycles-per-instruction (CPI). CPI can be derived by dividing CCNT by PMN0, where CCNT
was used to measure total execution time.
8.4.2
Data Cache Efficiency Mode
PMN0 totals the number of data cache accesses, which includes cacheable and non-cacheable
accesses, mini-data cache access and accesses made to locations configured as data RAM.
Note that STM and LDM will each count as several accesses to the data cache depending on the
number of registers specified in the register list. LDRD will register two accesses.
PMN1 counts the number of data cache and mini-data cache misses. Cache operations do not
contribute to this count. See
Section 7.2.8
for a description of these operations.
The statistic derived from these two events is:
•
Data cache miss-rate. This is derived by dividing PMN1 by PMN0.
8.4.3
Instruction Fetch Latency Mode
PMN0 accumulates the number of cycles when the instruction-cache is not able to deliver an
instruction to the core due to an instruction-cache miss or instruction-TLB miss. This event means
that the processor core is stalled.
PMN1 counts the number of instruction fetch requests to external memory. Each of these requests
loads 32 bytes at a time. This is the same event as measured in instruction cache efficiency mode.
Statistics derived from these two events:
•
The average number of cycles the processor stalled waiting for an instruction fetch from
external memory to return. This is calculated by dividing PMN0 by PMN1. If the average is
high then the core may be starved of the external bus.
•
The percentage of total execution cycles the processor stalled waiting on an instruction fetch
from external memory to return. This is calculated by dividing PMN0 by CCNT, which was
used to measure total execution time.