4-18
Intel® PXA27x Processor Family
Optimization Guide
Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
Move the ADD instruction to after the ORR instruction to prevent this stall.
4.3.1.11
Scheduling Coprocessor 15 Instructions
The MRC instruction has an issue latency of one cycle and a result latency of three cycles. The
MCR instruction has an issue latency of one cycle. The MOV instruction in the following example,
incurs a 2-cycle latency due to the 3-cycle result latency of the MRC instruction.
add r1, r2, r3
mrc p15, 0, r7, C1, C0, 0
mov r0, r7
add r1, r1, #1
Rearrange the code to avoid these stalls:
mrc p15, 0, r7, C1, C0, 0
add r1, r2, r3
add r1, r1, #1
mov r0, r7
4.3.2
Instruction Scheduling for Intel® Wireless MMX™
Technology
The Intel® Wireless MMX™ Technology provides an instruction set which offers the same
functionality as the Intel® Wireless MMX™ Technology and Streaming SIMD Extensions (SSE)
integer instructions.
4.3.2.1
Increasing Load Throughput on Intel® Wireless MMX™ Technology
The constraints on issuing load transactions with Intel XScale® Microarchitecture also hold with
Intel® Wireless MMX™ Technology. The considerations reviewed using the Intel XScale®
Microarchitecture instructions are re-illustrated in this section using the Intel® Wireless MMX™
Technology instruction set. The primary observations with load transactions are:
•
The buffering in the memory pipeline allows two load double transactions to be outstanding
without incurring a penalty (stall).
•
Back-to-back WLDRD instructions incur a stall, back-to-back WLDR(BHW) instructions do
not incur a stall
•
The WLDRD requires 4 cycles to return the DWORD assuming a cache hit, back-to-back
WLDR (BHW) require 3 cycles to return the data.
•
Use prefetching schemes with the above suggestions.
The overhead on issuing load transactions can be minimized by instruction scheduling and load
pipelining. In most cases it is straightforward to interleave other operation to avoid the penalty with
back-to-back LDRD instructions. In the following code sequence three WLDRD instructions are
issued back-to-back incurring a stall on the second and third instruction.
WLDRD wR3,R4,#8
WLDRD wR5,r4,#8 - STALL
Summary of Contents for PXA270
Page 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Page 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Page 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Page 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Page 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Page 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Page 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Page 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Page 144: ......