Intel® PXA27x Processor Family
Optimization Guide
4-33
Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
ldr r5, [r0], r4 ; r0 = pSrc,
ldr r11, [r0], r4
ldr r8, [r0], r4
ldr r12, [r0], r4
; These loads are scheduled to distinct destination registers
and r6, r5, r9 ; r6->tmp = tmp0 & 0xffff;
orr r6, r6, r11, lsl #16 ; r6->tmp |= tmp1 << 16;
and r11, r11, r9, lsl #16 ; r11->tmp1 &= 0xffff0000;
and r7, r8, r9 ; r7->tmp = tmp0 & 0xffff;
orr r11, r11, r5, lsr #16 ; r11->tmp1 |= tmp0 >> 16;
orr r7, r7, r12, lsl #16 ; r6->tmp |= tmp1 << 16;
str r6, [r1], #4 ; Write Coalesce the two stores
str r7, [r1], #4
and r12, r12, r9, lsl #16 ; r11->tmp1 &= 0xffff0000;
orr r12, r12, r8, lsr #16 ; r11->tmp1 |= tmp0 >> 16;
str r11, [r10], #4 ; Write Coalesce the two stores
str r12, [r10], #4
subs r14, r14, #1
bgt LOOP
In the following example, scheduled instructions take advantage of write-coalescing of multiple
store instructions to the same line. In this example, the two stores are combined in a single write-
buffer entry and issued as a single write request.
str r11, [r10], #4; Write Coalesce the two stores
str r12, [r10], #4
This can be exploited by either unrolling the C loop or by explicitly inlining multiple stores which
can be combined.
The register rotation technique also allows multiple loads to be outstanding.
4.6.5
Case Study 5: 8x8 Block 1/2X Motion Compensation
Bi-linear interpolation is a typical operation in image and video processing applications. For
example the video decode motion compensation uses the 1/2X interpolation operation. Using
Intel® Wireless MMX™ Technology features can help to accelerate these key applications. The
following code demonstrates how to attain this acceleration. These items are key issues for
optimizing the 1/2X motion compensation:
•
Use WALIGNR instruction for aligning the packed byte array
•
Use the WAVG2BR instruction for calculating the average of bytes.
•
Schedule around the load-to-use-latency
This example code is for the 1/2X interpolation:
; Test for special case of aligned ( LSBs = 110b and 000b)
; r0 -> pointer to misaligned array.
MOV r5,#7 ; r5 =0x7
AND r7,r0,r5 ; r7 -> 3 LSBs of *psrc
MOV r12,#4 ; counter
Summary of Contents for PXA270
Page 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Page 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Page 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Page 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Page 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Page 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Page 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Page 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Page 144: ......