8
January, 2004
Developer’s Manual
Intel XScale® Core
Developer’s Manual
Contents
A.2.1
General Pipeline Characteristics ......................................................................... 176
A.2.1.1.
Number of Pipeline Stages .................................................................. 176
A.2.1.2.
The Intel XScale
®
Core Pipeline Organization .................................... 177
A.2.1.3.
Out Of Order Completion ..................................................................... 178
A.2.1.4.
Register Scoreboarding ....................................................................... 178
A.2.1.5.
Use of Bypassing ................................................................................. 178
A.2.2
Instruction Flow Through the Pipeline ................................................................. 179
A.2.2.1.
ARM* V5TE Instruction Execution ....................................................... 179
A.2.2.2.
Pipeline Stalls ...................................................................................... 179
A.2.3
Main Execution Pipeline ...................................................................................... 180
A.2.3.1.
F1 / F2 (Instruction Fetch) Pipestages................................................. 180
A.2.3.2.
ID (Instruction Decode) Pipestage ....................................................... 180
A.2.3.3.
RF (Register File / Shifter) Pipestage .................................................. 181
A.2.3.4.
X1 (Execute) Pipestages ..................................................................... 181
A.2.3.5.
X2 (Execute 2) Pipestage .................................................................... 181
A.2.3.6.
WB (write-back) ................................................................................... 181
A.2.4
Memory Pipeline .................................................................................................. 182
A.2.4.1.
D1 and D2 Pipestage........................................................................... 182
A.2.5
Multiply/Multiply Accumulate (MAC) Pipeline ...................................................... 182
A.2.5.1.
Behavioral Description ......................................................................... 182
A.3
Basic Optimizations .......................................................................................................... 183
A.3.1
Conditional Instructions ....................................................................................... 183
A.3.1.1.
Optimizing Condition Checks............................................................... 183
A.3.1.2.
Optimizing Branches............................................................................ 184
A.3.1.3.
Optimizing Complex Expressions ........................................................ 186
A.3.2
Bit Field Manipulation .......................................................................................... 187
A.3.3
Optimizing the Use of Immediate Values............................................................. 188
A.3.4
Optimizing Integer Multiply and Divide ................................................................ 189
A.3.5
Effective Use of Addressing Modes..................................................................... 190
A.4
Cache and Prefetch Optimizations ................................................................................... 191
A.4.1
Instruction Cache................................................................................................. 191
A.4.1.1.
Cache Miss Cost.................................................................................. 191
A.4.1.2.
Round-Robin Replacement Cache Policy............................................ 191
A.4.1.3.
Code Placement to Reduce Cache Misses ......................................... 191
A.4.1.4.
Locking Code into the Instruction Cache ............................................. 192
A.4.2
Data and Mini Cache ........................................................................................... 193
A.4.2.1.
Non Cacheable Regions ...................................................................... 193
A.4.2.2.
Write-through and Write-back Cached Memory Regions .................... 193
A.4.2.3.
Read Allocate and Read-write Allocate Memory Regions ................... 194
A.4.2.4.
Creating On-chip RAM ......................................................................... 194
A.4.2.5.
Mini-data Cache................................................................................... 195
A.4.2.6.
Data Alignment .................................................................................... 196
A.4.2.7.
Literal Pools ......................................................................................... 197
A.4.3
Cache Considerations ......................................................................................... 198
A.4.3.1.
Cache Conflicts, Pollution and Pressure.............................................. 198
A.4.3.2.
Memory Page Thrashing...................................................................... 198
A.4.4
Prefetch Considerations ...................................................................................... 199
A.4.4.1.
Prefetch Distances............................................................................... 199
A.4.4.2.
Prefetch Loop Scheduling.................................................................... 199
A.4.4.3.
Prefetch Loop Limitations .................................................................... 199
A.4.4.4.
Compute vs. Data Bus Bound.............................................................. 199
A.4.4.5.
Low Number of Iterations..................................................................... 200