5-12
Intel® PXA27x Processor Family
Optimization Guide
High Level Language Optimization
For the first loop, most compilers generate an instruction to subtract i from 1000. An instruction is
then created to compare the result with 0. In the second loop, a subtraction instruction is not
needed, and the 1000 constant does not need to be stored in a register. Freeing that register might
save a stack operation for every iteration of the loop, significantly enhancing performance.
5.1.8
If-else versus Switch Statements
Compilers can often generate jump tables for switch statements that can jump to specific code
faster than traversing the conditionals of a cascading if-else statement. Some compilers, however,
may simply expand the switch statement into a cascading if-else statement. In general, using switch
statements where possible and always placing the most frequently traversed paths higher up in
either the switch statement or the cascading if-else code leads to the best optimization by the
compiler.
5.1.9
Nested If-Else and Switch Statements
Using nested if-else and switch statements can greatly reduce the number of comparison
instructions that are generated by the compiler. For example, consider a switch statement
containing 256-case statements. Without knowing if the compiler will generate a jump table or a
cascading if-else statement, the processor might potentially have to do 256 comparisons only to
find that not a single conditional is met.
By breaking the switch into two or more levels, the worst case lookup is dramatically reduced.
Using a switch statement with 16-case statements to jump to 16 other switch statements each with
16 cases reduces the non-existing case lookup to 16 comparisons and the worst case lookup to 32
comparisons.
5.1.10
Locality in Source Code
On many different levels, code that is cohesive, modular, and decoupled allow compilers to
optimize the code to the greatest extent. In C++, these attributes are encouraged by the language. In
C, it is very important to keep closely related code and data definitions in the same file as much as
possible. The compiler can more efficiently optimize the code this way, and similar data has a
higher degree of spatial locality to make better use of the data cache.
5.1.11
Choosing Data Types
Many applications inherently use sub-word data sizes, by packing a set of them into a single word
is beneficial for memory accesses and memory bandwidth. Packed data formats can also be
processed using the Intel® Wireless MMX™ Technology. The Intel XScale® Microarchitecture
performs best on word-size data aligned on a 4-byte boundary. Intel® Wireless MMX™
Technology requires data to be aligned on a 8-byte boundary.
5.1.12
Data Alignment For Maximizing Cache Usage
Cache lines begin on 32-byte address boundaries. To maximize cache line use and minimize cache
pollution, data structures should be aligned on 32-byte boundaries and sized to a multiple of the
cache line sizes. Aligning data structures on cache address boundaries simplifies later addition of
preload instructions to optimize performance.
Summary of Contents for PXA270
Page 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Page 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Page 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Page 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Page 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Page 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Page 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Page 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Page 144: ......