![Intel IXP45X Скачать руководство пользователя страница 165](http://html1.mh-extra.com/html/intel/ixp45x/ixp45x_developers-manual_2073092165.webp)
Intel
®
IXP45X and Intel
®
IXP46X Product Line of Network Processors
August 2006
Developer’s Manual
Order Number: 306262-004US
165
Intel XScale
®
Processor—Intel
®
IXP45X and Intel
®
IXP46X Product Line of Network Processors
3.7.4.5
Stall/Write-Back Statistics
When an instruction requires the result of a previous instruction and that result is not
yet available, the IXP45X/IXP46X network processors stall in order to preserve the
correct data dependencies. PMN0 counts the number of stall cycles due to data-
dependencies. Not all data-dependencies cause a stall; only the following dependencies
cause such a stall penalty:
• Load-use penalty: attempting to use the result of a load before the load completes.
To avoid the penalty, software should delay using the result of a load until it is
available. This penalty shows the latency effect of data-cache access.
• Multiply/Accumulate-use penalty: attempting to use the result of a multiply or
multiply-accumulate operation before the operation completes. Again, to avoid the
penalty, software should delay using the result until it is available.
• ALU use penalty: there are a few isolated cases where back-to-back ALU operations
may result in one cycle delay in the execution. These cases are defined in
Table 3.9, “Performance Considerations” on page 181
PMN1 counts the number of write-back operations emitted by the data cache. These
write-backs occur when the data cache evicts a dirty line of data to make room for a
newly requested line or as the result of clean operation (CP15, register 7).
Statistics derived from these two events:
• The percentage of total execution cycles the processor stalled because of a data
dependency. This is calculated by dividing PMN0 by CCNT, which was used to
measure total execution time. Often a compiler can reschedule code to avoid these
penalties when given the right optimization switches.
• Total number of data write-back requests to external memory can be derived solely
with PMN1.
3.7.4.6
Instruction TLB Efficiency Mode
PMN0 totals the number of instructions that were executed, which does not include
instructions that were translated by the instruction TLB and never executed. This can
happen if a branch instruction changes the program flow; the instruction TLB may
translate the next sequential instructions after the branch, before it receives the target
address of the branch.
PMN1 counts the number of instruction TLB table-walks, which occurs when there is a
TLB miss. If the instruction TLB is disabled PMN1 will not increment.
Statistics derived from these two events:
• Instruction TLB miss-rate. This is derived by dividing PMN1 by PMN0.
• The average number of cycles it took to execute an instruction or commonly
referred to as cycles-per-instruction (CPI). CPI can be derived by dividing CCNT by
PMN0, where CCNT was used to measure total execution time.
3.7.4.7
Data TLB Efficiency Mode
PMN0 totals the number of data cache accesses, which includes cacheable and non-
cacheable accesses, mini-data cache access and accesses made to locations configured
as data RAM.
Note that STM and LDM will each count as several accesses to the data TLB depending
on the number of registers specified in the register list. LDRD will register two
accesses.