Intel PXA270 Скачать руководство пользователя страница 71 | Manualshive

Страница: 71 / 144

background image

Intel® PXA27x Processor Family

Optimization Guide

4-23

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization

WMACS wR2, wR1, wR0

SUBS r3, r3, #4

BNE Loop_Begin

The parallelism of the filter may be exposed further by unrolling the loop to provide for eight taps
per iteration. In the following code sequence, the loop has been unrolled once allowing several

load-to-use stalls to be eliminated. The loop overhead has also been further amortized reducing it

from two cycles for every four taps to 2 cycles for every eight taps. There is still a single load-to-

use stall present between the second WLDRD instruction and the second WMACS instruction
within the inner loop

; Pointers r0 -> val , r1 -> pResult, r2 -> pTapsQ15 r3 -> tapsLen

WLDRD wR0, [r2] , #8

WZERO wR15

WLDRD wR1, [r4] , #8

Loop_Begin:

WLDRD wR2, [r2] , #8

SUBS r3, r3, #8

WLDRD wR3, [r4] , #8

WMACS wR15, wR1, wR0

WLDRDNE wR0, [r2] , #8

WMACS wR15, wR2, wR3

WLDRDNE wR1, [r4] , #8

BNE Loop_Begin

4.4.1.1

General Remarks on Software Pipelining

In the example for the real block FIR filter, two copies of the basic sequence of code were

interleaved eliminating all but one of the stalls. The throughput for the sequence went from

9 cycles for every four taps to 9 cycles for every eight taps. This corresponds to a throughput of
1.125 cycles per tap represents a 2X throughput improvement.

It is useful to define a metric to describe the number of copies of a basic sequence of instructions

which need to be interleaved in order to remove all stalls. We can call this the interleave factor, k.
The real block FIR filter requires k=2 to eliminate all possible stalls primarily because it is a small

sequence which must take into account the long load-to-use latency. In practice, k=2 is sufficient

for most loops encountered in real applications. This is fortunate because each interleaving requires

its own set of temporary registers and with some algorithms interleaving with k=3 is not possible.
A good rule of thumb is to try k=2 first, as it is usually the right choice.

4.4.2

Multi-Sample Technique

The multi-sample optimization technique provides for calculating multiple outputs with each loop

iteration similar to loop unrolling. The disadvantages of applying this technique include, increases

in code size for critical loops. Restrictions on the minimum and multiples of taps or samples are
also imposed. The obvious advantage is in reduced cycle consumption.

•

Memory bandwidth is reduced by data re-use.

•

Load-to-use stalls may be easily eliminated with scheduling.

«
...
69
70
71
72
73
...
»

Содержание PXA270

Страница 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...

Страница 2: ...ed in accordance with the terms of the license The information in this document is furnished for informational use only is subject to change without notice and should not be construed as a commitment...

Страница 3: ...le Microarchitecture Pipeline 2 1 2 2 1 General Pipeline Characteristics 2 1 2 2 1 1 Pipeline Organization 2 1 2 2 1 2 Out of Order Completion 2 2 2 2 1 3 Use of Bypassing 2 2 2 2 2 Instruction Flow T...

Страница 4: ...n the Internal SRAM 3 6 3 3 2 3 Creating Scratch RAM in Data Cache 3 7 3 3 2 4 Reducing Memory Page Thrashing 3 7 3 3 2 5 Using Mini Data Cache 3 8 3 3 2 6 Reducing Cache Conflicts Pollution and Press...

Страница 5: ...nd MSR Instructions 4 17 4 3 1 11 Scheduling Coprocessor 15 Instructions 4 18 4 3 2 Instruction Scheduling for Intel Wireless MMX Technology 4 18 4 3 2 1 Increasing Load Throughput on Intel Wireless M...

Страница 6: ...and C Level Optimization 5 1 5 1 1 Efficient Usage of Preloading 5 1 5 1 1 1 Preload Considerations 5 1 5 1 1 2 Preload Loop Limitations 5 3 5 1 1 3 Coding Technique with Preload 5 4 5 1 2 Array Merg...

Страница 7: ...ion Guidelines A 2 Glossary Glossary 1 Figures 1 1 PXA27x Processor Block Diagram 1 3 2 1 Intel XScale Microarchitecture RISC Superpipeline 2 1 2 2 Intel Wireless MMX Technology Pipeline Threads and r...

Страница 8: ...truction Timings 4 41 4 12 Load and Store Multiple Instruction Timings 4 41 4 13 Semaphore Instruction Timings 4 42 4 14 CP15 Register Access Instruction Timings 4 42 4 15 CP14 Register Access Instruc...

Страница 9: ...Intel PXA27x Processor Family Optimization Guide ix Contents Revision History Date Revision Description April 2004 001 Initial release...

Страница 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...

Страница 11: ...l Chapter 4 Intel XScale Microarchitecture Intel Wireless MMX Technology Optimization discusses how to optimize software mostly at the assembly programming level to take advantage of the Intel XScale...

Страница 12: ...PCA processors help to redefine what a mobile device can do to meet many of the performance demands of Enterprise class wireless computing and feature hungry technology consumers Targeted at wireless...

Страница 13: ...power OEMs to develop smaller more cost effective handheld devices with long battery life with the performance to run rich multimedia applications Or the microarchitecture could be surrounded by high...

Страница 14: ...s permissions D cache attributes 4 entry Fill and Pend buffers promote core efficiency by allowing hit under miss operation with data caches Performance monitoring unit furnishes two 32 bit event coun...

Страница 15: ...External Memory Controller The PXA27x processor supports a memory controller for external memory which can access SDRAM up to 100 MHz at 1 8 Volts Flash memories Synchronous ROM SRAM Variable latency...

Страница 16: ...performance DMA controller supporting memory to memory transfers peripheral to memory and memory to peripheral device transfers It has support for 32 channels and up to 63 peripheral devices The cont...

Страница 17: ...ay be programmed as an output an input or as bidirectional for certain alternate functions 1 2 6 Wireless Intel Speedstep technology Wireless Intel Speedstep technology advances the capabilities of In...

Страница 18: ...nts added to this core Memory map and register locations are backward compatible with the previous Intel XScale Microarchitecture hand held products The Intel Wireless MMX technology instruction set i...

Страница 19: ...on preload aborts Access control to other coprocessors Enhanced set of supported cache control options A branch target buffer for dynamic branch prediction Performance monitoring unit Software debug s...

Страница 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...

Страница 21: ...d for the Intel XScale Microarchitecture using the techniques presented in this document 2 2 Intel XScale Microarchitecture Pipeline This section provides a brief description of the structure and beha...

Страница 22: ...ependencies between instructions A register dependency occurs when a previous MAC or load instruction is about to modify a register value that has not returned to the register file Core bypassing allo...

Страница 23: ...id pipeline stalls The following sections provide more detail on the nature of the pipeline and ways of preventing stalls 2 2 3 Main Execution Pipeline 2 2 3 1 F1 F2 Instruction Fetch Pipestages The j...

Страница 24: ...revious instruction is about to modify a register value that has not been returned to the RFU and the current instruction needs to access that same register If no dependencies exist the RFU selects th...

Страница 25: ...write performance by the use of write coalescing Coalescing is combining a new store operation with an existing store operation already resident in the write buffer The new store is placed in the sam...

Страница 26: ...rce operands Results are completed N cycles later where N is dependent on the operand size and returned to the register file For more information on MAC instruction latencies refer to Section 4 8 Inst...

Страница 27: ...e with the remainder of the decoding being completed in the RF stage However it is worth noting that the register address decoding is fully completed in the ID stage because the register file needs to...

Страница 28: ...hitecture detects exceptions and flushes in the X2 pipe stage Intel Wireless MMX Technology also flushes all the pipeline stages 2 3 1 5 XWB Stage The XWB stage is the last stage of the X pipeline whe...

Страница 29: ...des a virtual address that is used to access the data cache There is no logic inside the Intel Wireless MMX Technology in the D1 pipe stage 2 3 3 2 D2 Stage The D2 stage is where load data is returned...

Страница 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...

Страница 31: ...MHz turbo mode using only a 200 MHz run mode frequency The clock frequency combination should be chosen to fit the target application mix Possible frequency selections are listed in the clocks and pow...

Страница 32: ...fer strength should be set to the lowest possible setting minimum drive strength that still allows for reliable memory system performance This will minimize the power usage of the external memory bus...

Страница 33: ...y the ARM architecture This behavior is detailed in Table 3 3 If the X bit for a descriptor is one the C and B bits behave differently as shown in Table 3 4 The load and store buffer behavior in Intel...

Страница 34: ...much faster than external memory Executing non cached instructions severely curtails the processor s performance so it is important to do everything possible to minimize cache misses 1 0 Mini data cac...

Страница 35: ...into the instruction cache Once locked into the instruction cache the code is always available for fast execution Another reason for locking critical code into cache is that with the round robin repla...

Страница 36: ...es cause an additional read from the memory during a write miss Subsequent read and write performance may be improved by more frequent cache hits Most of the regular data and the stack for application...

Страница 37: ...64 bytes each that are allocated to the on chip RAM and assume that the address of arr1 is 0 address of arr2 is 1024 and the address of arr3 is 2048 All three arrays are within the same sets set0 and...

Страница 38: ...rashes the cache The mini data cache could also be used to keep frequently used tables cached The advantage of keeping these in the minicache is two fold First the data thrashing in the main cache doe...

Страница 39: ...the external memory This scheme may free up some internal memory space for OS and user applications Depending on the user profile the internal memory can be used for different purposes 3 4 1 LCD Fram...

Страница 40: ...i data caches Data pre loading allows hiding of memory transfer latency while the processor continues to execute instructions The preload is important to compiler and assembly code because judicious u...

Страница 41: ...bandwidth requirements The formula for each plane is Length and width are the number of lines per panel and pixels per line respectively Refresh rate is in frames per second BPP is bits per pixel in...

Страница 42: ...this problem The LCD controller has an internal buffering mechanism to minimize the impact of fluctuations in the bandwidths The maximum latency the LCD controller can tolerate for it s 32 byte burst...

Страница 43: ...me sum multiplied by the percent of time the overlay is enabled After estimating the total accesses for the base plane and all overlays employed place the frame buffers for the planes with the highest...

Страница 44: ...region If this region is set to noncached but bufferable graphics performance improvements can be achieved The noncached but bufferable mode X 0 C 0 B 1 improves write performance by allowing the cons...

Страница 45: ...rks is necessary in order to begin tuning the arbiter settings 3 5 2 2 Determining the Optimal Weights for Clients The weights are decided based on the real time RT deadline1 bandwidth BW requirements...

Страница 46: ...commended that the OS and applications use this feature to park the bus where it results in the best performance for the current task While most applications have the highest performance with the bus...

Страница 47: ...ons and generally improves efficiency This can be disabled and requires active transactions complete before another transaction starts Please refer to the DMA Programmed I O Control Status register de...

Страница 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...

Страница 49: ...XScale Microarchitecture instructions to modify the condition codes makes a wide array of optimizations possible 4 2 1 Conditional Instructions and Loop Control The Intel XScale Microarchitecture ins...

Страница 50: ...ck for the loop exit condition for i 0 i 10 i do something If the loop were rewritten as follows the code generated avoids using the compare instruction to check for the loop exit condition for i 9 i...

Страница 51: ...erage the code above takes 5 5 cycles to execute Using the Intel XScale Microarchitecture to execute instructions conditionally the code generated for the preceding if else statement is cmp r0 10 movg...

Страница 52: ...se branches instead of conditional instructions cmp r0 0 bne L1 add r0 r0 1 add r1 r1 1 add r2 r2 1 add r3 r3 1 add r4 r4 1 b L2 L1 sub r0 r0 1 sub r1 r1 1 sub r2 r2 1 sub r3 r3 1 sub r4 r4 1 L2 The C...

Страница 53: ...shortcut evaluation feature The use of conditional instructions in this fashion improves performance by minimizing the number of branches thereby minimizing the penalties caused by branch mispredicti...

Страница 54: ...d DES Triple DES T DES Hashing functions SHA This approach helps other application such as network packet parsing and voice stream parsing 4 2 4 Optimizing the Use of Immediate Values Use the Intel XS...

Страница 55: ...y 2n 1 add r0 r0 r0 LSL n Multiplication by an integer constant expressed as can be optimized Multiplication of r0 by an integer constant that can be expressed as 2n 1 2m add r0 r0 r0 LSL n mov r0 r0...

Страница 56: ...r0 to the value contained in r1 and make r0 point to the previous word str r1 r0 4 Decrement the contents of r0 to make it point to the previous word and set the contents of the word pointed to the v...

Страница 57: ...0 addne r4 r5 4 subeq r4 r5 4 ldr r0 r4 cmp r0 10 This example rewrites this code to make it run faster at the expense of increasing code size cmp r1 0 ldrne r0 r5 4 ldreq r0 r5 4 addne r4 r5 4 subeq...

Страница 58: ...lue in register r6 is not used after this It is possible to move the ADD and the LDR instructions before the SUB instruction so that the contents of register R6 are allowed to spill and restore from t...

Страница 59: ...he number of loads that are outstanding Use the number of outstanding loads to improve performance of the PXA27x processor 4 3 1 2 Increasing Load Throughput Increasing load throughput for data demand...

Страница 60: ...PXA27x processor set by the page table attributes combines multiple stores going to the same half of the cache line into a single memory transaction This approach increases the bus efficiency and thro...

Страница 61: ...be aligned on an 8 byte boundary The specified register must be even r0 r2 Using LDRD STRD instead of LDM STM to do the same thing is more efficient because LDRD STRD issues in only one or two clock c...

Страница 62: ...yte boundary Achieve this using the following LDM instructions r0 contains the address of the value being copied r1 contains the address of the destination location ldm r0 r2 r3 ldm r1 r4 r5 adds r0 r...

Страница 63: ...ata processing instructions incur a two cycle issue penalty and a two cycle result penalty when the shifter operand is shifted rotated by a register or the shifter operand is a register The next instr...

Страница 64: ...r0 0 Refer to Section 4 8 Instruction Latencies for Intel XScale Microarchitecture for more information on instruction latencies for various multiply instructions The multiply instructions should be s...

Страница 65: ...lty due to the three cycle result latency for the second destination register mra r6 r7 acc0 mov r1 r7 mov r0 r6 add r2 r2 1 Rearrange the code to prevent the stall mra r6 r7 acc0 add r2 r2 1 mov r0 r...

Страница 66: ...reless MMX Technology The constraints on issuing load transactions with Intel XScale Microarchitecture also hold with Intel Wireless MMX Technology The considerations reviewed using the Intel XScale M...

Страница 67: ...r instructions WLDRD wR0 r2 8 WZERO wR15 WLDRD wR1 r4 8 SUBS r3 r3 8 WLDRD wR3 r4 8 Always try to interleave additional operations between the load instruction and the instruction which will first use...

Страница 68: ...D instruction in the following example executes with no stalls WMACS wR14 wR1 wR2 ADD R1 R2 R3 Refer to Section 4 8 Instruction Latencies for Intel XScale Microarchitecture for more information on ins...

Страница 69: ...on the data Compute intensive processing In the following sections we illustrate how the rules for writing fast sequences of Intel MMX Technology instructions on Intel Wireless MMX Technology can be...

Страница 70: ...written to illustrate that 4 taps are computed for each loop iteration for i 0 i N i s0 0 for j 0 j T 4 j 4 s0 a j x i j s0 a j 1 x i j 1 s0 a j 2 x i j 2 s0 a j 3 x i j 3 y i round s0 The direct asse...

Страница 71: ...t from 9 cycles for every four taps to 9 cycles for every eight taps This corresponds to a throughput of 1 125 cycles per tap represents a 2X throughput improvement It is useful to define a metric to...

Страница 72: ...ned to four 64 bit Intel Wireless MMX Technology registers In order to obtain near ideal throughput the inner loop is unrolled to provide for eight taps for each of the four output samples per loops i...

Страница 73: ...variations are possible 4 4 3 Data Alignment Techniques The exploitation of the data parallelism present in multimedia algorithms is accomplished by executing the same operation on different elements...

Страница 74: ...since algorithm mapping to Intel Wireless MMX Technology may be significantly accelerated The Intel MMX Technology target pipeline and architecture is different than Intel Wireless MMX Technology and...

Страница 75: ...t have MOV instructions associated with the destructive register behavior to improve throughput The following is an example of Intel MMX Technology to Intel Wireless MMX Technology instruction mapping...

Страница 76: ...l MMX Technology Instructions Input wR0 Source Value Input mm0 Source Value mm7 0 WUNPCKELU wR1 wR0 MOVQ mm1 mm0 WUNPCKEHU wR2 wR0 PUNPCKLWD mm0 mm7 PUNPCKHWD mm1 mm WPACK h w US PACKUS wb dw WAND PAN...

Страница 77: ...ines can benefit greatly by being optimized for the Intel XScale Microarchitecture The following string and memory manipulation routines are good candidates to be tuned for the Intel XScale Microarchi...

Страница 78: ...s current iteration It also uses LDRD and groups the STRs together to coalesce 4 6 2 Case Study 2 Optimizing Memory Fill Graphics applications use fill routines Most of the personal data assistant PDA...

Страница 79: ...r0 4 writing out as words str r4 r0 4 instead of bytes or half words str r4 r0 4 achieves optimum performance str r4 r0 4 str r4 r0 4 str r4 r0 4 str r4 r0 4 str r4 r0 4 str r4 r0 4 str r4 r0 4 str r4...

Страница 80: ...processing However if the end user views the output in portrait mode a portrait to landscape conversion needs to occur each time the frame buffer writes to the display The display driver usually impl...

Страница 81: ...ed as a single write request str r11 r10 4 Write Coalesce the two stores str r12 r10 4 This can be exploited by either unrolling the C loop or by explicitly inlining multiple stores which can be combi...

Страница 82: ...r2 ADD r2 r2 r3 Adding stride BNE LOOP 4 7 Intel Performance Primitives Users who want to take full advantage of many of the optimizations in this guide are likely to use these techniques Write hand o...

Страница 83: ...e multimedia CODECs Video ITU H 263 decoder ISO IEC 14496 2 MPEG 4 decoder Audio ISO IEC 11172 3 and 13818 3 MPEG 1 2 Layer 3 MP3 decoder Speech ITU T G 723 1 CODEC and ETSI GSM AMR codec Image ISO IE...

Страница 84: ...ediction correct prediction is assumed Minimum Result Latency This represents the required minimum cycle is the distance from the issue clock of the current instruction to the issue clock of the first...

Страница 85: ...le 4 2 Latency Example Cycle Issue Executing 0 umlal 1st cycle 1 umlal 2nd cycle umlal 2 add umlal 3 sub stalled umlal add 4 sub stalled umlal 5 sub umlal 6 mov sub 7 mov Table 4 3 Branch Instruction...

Страница 86: ...te by Register Or shifter operand is RRX Minimum Issue Latency Minimum Result Latency Minimum Issue Latency Minimum Result Latency ADC 1 1 2 2 ADD 1 1 2 2 AND 1 1 2 2 BIC 1 1 2 2 CMN 1 1 2 2 CMP 1 1 2...

Страница 87: ...or Rs 31 15 0x1FFFF 0 1 2 1 1 2 2 2 Rs 31 27 0x00 or Rs 31 27 0x1F 0 1 3 2 1 3 3 3 all others 0 1 4 3 1 4 4 4 SMLAL Rs 31 15 0x00000 or Rs 31 15 0x1FFFF 0 2 RdLo 2 RdHi 3 2 1 3 3 3 Rs 31 27 0x00 or R...

Страница 88: ...Latency Throughput MIA Rs 31 15 0x0000 or Rs 31 15 0xFFFF 1 1 1 Rs 31 27 0x0 or Rs 31 27 0xF 1 2 2 all others 1 3 3 MIAxy N A 1 1 1 MIAPH N A 1 2 2 Table 4 8 Implicit Accumulator Access Instruction Ti...

Страница 89: ...writeback of base LDRH 1 3 for load data 1 for writeback of base LDRSB 1 3 for load data 1 for writeback of base LDRSH 1 3 for load data 1 for writeback of base LDRT 1 3 for load data 1 for writeback...

Страница 90: ...struction Minimum Issue Latency Minimum Result Latency MRC 4 4 MCR 2 N A MRC to R15 is unpredictable MRC and MCR to CP0 and CP1 is described in the Intel Wireless MMX Technology section Table 4 15 CP1...

Страница 91: ...B instructions to ARM instructions can be found in the ARM Architecture Reference Manual 4 9 Instruction Latencies for Intel Wireless MMX Technology The issue cycle and result latency of all the PXA27...

Страница 92: ...1 1 WALIGNR 1 1 WSHUF 1 1 TANDC 1 1 TORC 1 1 TEXTRC 1 1 TEXTRM 1 2 TMCR 1 3 TMCRR 1 1 TMRC 1 2 TMRRC 1 3 TMOVMSK 1 2 TINSTR 1 1 TBCST 1 1 WLDR BHW to main regfile 1 4 3 WLDRW to control regfile 1 4 W...

Страница 93: ...late the result when certain qualifiers are specified This list describes the data hazards for the PXA27x processor 1 0 implementation When saturation is specified for WADD or WSUB the result latency...

Страница 94: ...ction note the resource may still be processing the previous instruction further down its internal pipeline A delay of one clock cycle indicates that the resource is available immediately to the next...

Страница 95: ...he multiply resource These delays for are shown below in Table 4 21 For example if a TMIA instruction is followed by a TMIAph class3 instruction then the TMIAph sees a resource availability of 2 cycle...

Страница 96: ...ntly full due to a sequence of memory transactions the following instruction must wait for space in the buffer The resource availability delay in this case is two cycles This is summarized in Table 4...

Страница 97: ...For optimum performance the MAC unit in the core should not be used adjacent to TMRC instructions as they both share the route back to the core register file 4 10 2 5 Multiple Pipelines The WSAD TMIA...

Страница 98: ...4 50 Intel PXA27x Processor Family Optimization Guide Intel XScale Microarchitecture Intel Wireless MMX Technology Optimization...

Страница 99: ...ase where a linked list or recursive data structure is terminated by a NULL pointer Preloading the NULL pointer does not cause a fault The preload instructions PLD can be inserted by the compiler duri...

Страница 100: ...ere to fetch the data The number of iterations to preload ahead is referred to as the preload scheduling distance PSD For the Intel XScale Microarchitecture this can be calculated as Where Nlinexfer T...

Страница 101: ...racteristics that limit value of adding preloads are discussed below 5 1 1 2 1 Preload Limitations Throughput bound vs Latency bound The worst case is a loop which is bounded by the memory throughput...

Страница 102: ...allow the memory bus traffic to flow freely and to minimize the number of necessary preloads Section 5 1 1 3 discusses code optimization for preloading 5 1 1 3 Coding Technique with Preload Since prel...

Страница 103: ...arrays as much as possible while p prefetch p next do_something p data p p next Recursive data structure traversal is another construct where preloading can be applied This is similar to linked list t...

Страница 104: ...n later chapters 5 1 2 Array Merging Stride the way data structures are walked through can affect the temporal quality of the data and reduce or increase cache conflicts Intel XScale Microarchitecture...

Страница 105: ...ear2Date401KDed float Year2DateOtherDed In the data structure shown above the fields Year2DatePay Year2DateTax Year2Date401KDed and Year2DateOtherDed are likely to change with each pay check The remai...

Страница 106: ...k 100 k for j2 0 j 100 j for k2 0 k 100 k j j1 100 j2 k k1 100 k2 C j k A i k B j i 5 1 4 Loop Interchange As previously mentioned the sequence in which data is accessed affects cache thrashing Usual...

Страница 107: ...c i for i 0 i NMAX i prefetch D i 1 c i 1 A i 1 D i A i c i The second loop reuses the data elements A i and c i Fusing the loops together produces for i 0 i NMAX i prefetch D i 1 A i 1 c i 1 b i 1 a...

Страница 108: ...down an arbitrary size loop into small unrolled blocks some loop overhead can be avoided For example it is unlikely that a compiler will unroll this code void f int nTotalIterations for i 0 i nTotalI...

Страница 109: ...he benefit of this technique Again performance may potentially decline if the instructions within the unrolled block do not fit in the instruction cache Ensure that all inline functions inline procedu...

Страница 110: ...ngle conditional is met By breaking the switch into two or more levels the worst case lookup is dramatically reduced Using a switch statement with 16 case statements to jump to 16 other switch stateme...

Страница 111: ...ligned on a cache line 12 bytes then the prefetch would have to be placed on tdata i 1 id If the structure is not sized to a multiple of the cache line size then the preload address must be advanced a...

Страница 112: ...locality However local variables should also be kept to a minimum so that fewer register values must be pushed and subsequently popped from the stack Also loops run much more efficiently if all the d...

Страница 113: ...15 High Level Language Optimization Passing by pointer or reference is highly preferred over passing by value Passing by value should only be used when there is a compelling reason to do so Small dat...

Страница 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...

Страница 115: ...s that may help to reduce power consumption consumed primarily by the Intel XScale core 6 2 1 Code Optimization for Power Consumption In most cases optimizing the operating system OS or application fo...

Страница 116: ...re placed in a low power mode where state is retained but no activity is allowed some of the internal power domains see the Intel PXA270 Processor Electrical Mechanical and Thermal Specification and t...

Страница 117: ...ntel Speedstep Technology There are some additional considerations and additions required by applications in order to take advantage of the power manager but these additions were minimal For details a...

Страница 118: ...of power 6 2 4 1 Fast Bus Mode The system bus frequency can be doubled through the use of CLKCGF B refer to the CLKCFG Bit Definitions table in the Intel PXA27x Processor Family Developer s Manual Whe...

Страница 119: ...r the PXA27x processor external memory bus have programmable strength settings This feature allows for simple software based control of the output driver impedance for the external memory bus Use thes...

Страница 120: ...power savings using 1 8 V SDRAM compared to 2 5 V or 3 3 V SDRAM If no other devices in the system use 1 8 V then users must consider the power savings compared to the extra components and board real...

Страница 121: ...al inputs USBCP and USBCN to a logic high 6 3 7 5 Sleep Mode For lowest power consumption in sleep mode Disable and ground VCC_SRAM VCC_Core and VCC_PLL Configure all possible IO pins as outputs and d...

Страница 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...

Страница 123: ...ame buffer non cached but bufferable Use write back caches if possible Optimize assembly code based on the suggestions presented in this guide Enable the branch target buffer Configure non cacheable m...

Страница 124: ...est possible run and turbo mode frequencies Higher run and turbo mode frequencies consume more power Optimize system for desired power and performance Consider performing a frequency change sequence t...

Страница 125: ...signals that are converted into a format that allows them to carry data Cellular phones and other wireless devices use analog in geographic areas with insufficient digital networks ARM V5te An ARM ar...

Страница 126: ...ssed in bytes per second BTB Branch Target Buffer BTS Base Transmitter Station Buffer Storage used to compensate for a difference in data rates or time of occurrence of events when transmitting data f...

Страница 127: ...guring Software Software resident on the host software that is responsible for configuring a USB device This may be a system configuration or software specific to the device Control Endpoint A pair of...

Страница 128: ...USB device This software may or may not also be responsible for configuring the device for use DMA Direct Memory Access Downstream The direction of data flow from the host or away from the host A dow...

Страница 129: ...erpreted by a packet receiver as an EOP FDD The Mobile Station transmits on one frequency the Base Station transmits on another frequency FDM Frequency Division Multiplexing Each Mobile station transm...

Страница 130: ...DML uses hypertext transfer protocol HTTP to display text versions of web pages on wireless devices Unlike WML HDML is not based on XML HDML does not allow scripts while WML uses a variant of JavaScri...

Страница 131: ...en itself on the host and an endpoint of a device in an appropriate direction IrDA Infrared Development Association IRP See I O Request Packet IRQ See Interrupt Request ISI Inter signal interference D...

Страница 132: ...allows requests to be reliably identified and communicated Microframe A 125 microsecond time base established on high speed buses MMC Multimedia Card small form factor memory and I O card MMX Technolo...

Страница 133: ...d Network Networks that transfer packets of data PCMCIA Personal Computer Memory Card Interface Association PC Card PCS Personal Communications services An alternative to cellular PCD works like cellu...

Страница 134: ...Frequency Device These devices use radio frequencies to transmit data One typical use is for bar code scanning of products in a warehouse or distribution center and sending that information to an ERP...

Страница 135: ...int per unit time SIMD Single Instruction Multiple Data a parallel processing architecture Smart Phone A combination of a mobile phone and a PDA which allow users to communicate as well as perform tas...

Страница 136: ...ster clock There is a fixed relation between Fsi and Fso System Programming Interface SPI A defined interface to services provided by system software TC Temperature Cycling TDD Time Division Duplexing...

Страница 137: ...er Transmitter serial port Universal Serial Bus Driver USBD The host resident software entity responsible for providing common services to clients that are manipulating one or more functions on one or...

Страница 138: ...technology integrates the high performance of Intel MMXTM technology and the integer functions from Streaming SIMD Extensions SSE to the Intel XScaleTM microarchitecture Intel Wireless MMX technology...

Страница 139: ...nal Instructions and Loop Control 1 Coprocessor Interface Pipeline 49 Count Leading Zeros Instruction Timings 42 CP14 Register Access Instruction Timings 42 CP15 and CP14 Coprocessor Instructions 42 C...

Страница 140: ...ion Mapping 27 Intel Wireless MMX Technology Pipeline 7 Intel Wireless MMX Technology Pipeline Threads and relation with Intel XScale Microarchitecture Pipe line 7 Interleaved Pack with Saturation Exa...

Страница 141: ...load scheduling distance PSD 2 Processor Internal Communications 5 Program Flow and Branch Instructions 2 PXA27x Processor Block Diagram 3 PXA27x processor Mapping to Intel Wireless MMX Technology and...

Страница 142: ...DMA 17 Use of Bypassing 2 Using Mini Data Cache 8 V Voltage and Regulators 6 W Weight for Core 16 Weight for DMA 15 Weight for LCD 15 Wireless Intel Speedstep technology 7 Wireless Intel Speedstep Te...

Страница 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...

Страница 144: ......

Отзывы:

Нет отзывов

Похожие инструкции для PXA270

Бренд: Ubiquiti Страницы: 24

Бренд: Omnia Страницы: 102

Бренд: Jabra Страницы: 15

Бренд: Keithley Страницы: 59

Бренд: ETAS Страницы: 69

Бренд: Valcom Страницы: 4

Бренд: IEI Technology Страницы: 4

Power System S914

Бренд: IBM Страницы: 48

Бренд: Ciara Страницы: 14

Broadband 700 MHz

Бренд: Alcatel-Lucent Страницы: 8

WIRELESS-G CARDBUS PC CARD - GUIDE D

Бренд: Hawking Страницы: 26

Бренд: TL Audio Страницы: 37

Бренд: Renfert Страницы: 29

XBurst 2 CPU Core

Бренд: Ingenic Страницы: 143

MAG CORELIQUID 240R V2

Бренд: MSI Страницы: 21

AAEON BOXER-8256AI

Бренд: Asus Страницы: 55

Бренд: Asus Страницы: 40

Бренд: Asus Страницы: 75

Бренды по названию

0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Популярные бренды

Загрузить еще бренды