Intel® PXA27x Processor Family
Optimization Guide
4-29
Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
4.5.3
Signed Unpack Example
The signed unpack replaces the Intel® MMX™ Technology sequence:
Intel® Wireless MMX™ Technology
Instructions
Intel® MMX™ Technology
Instructions
Input: wR0
: Source Value
Input: mm0 : Source Value
WUNPCKELS wR1 , wR0
PUNPCKHWD mm1, mm0
WUNPCKEHS wR2 ,
wR0
PUNPCKLWD mm0, mm0
PSRAD
mm0, 16
PSRAD
mm1, 16
4.5.4
Interleaved Pack with Saturation Example
This example uses signed words as source operands and the result is interleaved signed halfwords.
Intel® Wireless MMX™ Technology
Instructions
Intel® MMX™ Technology
Instructions
Input: wR0
: Source Value 1
Input: mm0
: Source Value 1
wR1
: Source Value 2
mm1
: Source Value 2
WPACKWSS
wR2 , wR0, wR1
PACKSSDW
mm0, mm0
WSHUFH
wR2 , wR2, #216
PACKSSDW
mm1, mm1
PUNPKLWD
mm0, mm1
4.6
Optimizing Libraries for System Performance
Many of the standard C library routines can benefit greatly by being optimized for the Intel
XScale® Microarchitecture. The following string and memory manipulation routines are good
candidates to be tuned for the Intel XScale® Microarchitecture.
strcat, strchr, strcmp, strcoll, strcpy, strcspn, strlen, strncat, strncmp, strpbrk, strrchr, strspn,
strstr, strtok, strxfrm, memchr, memcmp, memcpy, memmove, memset
Apart from the C libraries, there are many critical functions that can be optimized in the same
fashion. For example, graphics drivers and graphics applications frequently use a set of key
functions. These functions can be optimized for the PXA27x processor. In the following sections a
set of routines are provided as optimization case studies.
4.6.1
Case Study 1: Memory-to-Memory Copy
The performance of memory copy (memcpy) is influenced by memory-access latency and memory
throughput. During memcpy, if the source and destination are both in cache, the performance is the
highest and simple load-instruction scheduling can ensure the most efficient performance.
However, if the source or the destination is not in the cache, a load-latency-hiding technique has to
be applied.
Содержание PXA270
Страница 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Страница 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Страница 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Страница 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Страница 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Страница 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Страница 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Страница 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Страница 144: ......