Intel® PXA27x Processor Family
Optimization Guide
2-5
Microarchitecture Overview
2.2.3.5
Execute 2 (X2) Pipestage
The X2 pipestage contains the program status registers
(PSR). This pipestage selects the data to be
written to the RFU in the WB cycle including the following items.
The X2 pipestage contains the current program status register (CPSR). This pipestage selects what
is written to the RFU in the WB cycle including program status registers.
2.2.3.6
Write-Back (WB)
When an instruction reaches the write-back stage it is considered complete. Instruction results are
written to the RFU.
2.2.4
Memory Pipeline
The memory pipeline consists of two stages, D1 and D2. The data cache unit (DCU) consists of the
data cache array, mini-data cache, fill buffers, and write buffers. The memory pipeline handles load
and store instructions.
2.2.4.1
D1 and D2 Pipestage
Operation begins in D1 after the X1 pipestage calculates the effective address for loads and stores.
The data cache and mini-data cache return the destination data in the D2 pipestage. Before data is
returned in the D2 pipestage, sign extension and byte alignment occurs for byte and half-word
loads.
2.2.4.1.1
Write Buffer Behavior
The Intel XScale® Microarchitecture has enhanced write performance by the use of write
coalescing. Coalescing is combining a new store operation with an existing store operation already
resident in the write buffer. The new store is placed in the same write buffer entry as an existing
store when the address of new store falls in the 4-word aligned address of the existing entry.
The core can coalesce any of the four entries in the write buffer. The Intel XScale®
Microarchitecture has a global coalesce disable bit located in the Control register (CP15, register 1,
opcode_2=1).
2.2.4.1.2
Read Buffer Behavior
The Intel XScale® Microarchitecture has four fill buffers that allow four outstanding loads to the
cache and external memory. Four outstanding loads increases the memory throughput and the bus
efficiency. This feature can also be used to hide latency. Page table attributes affect the load
behavior; for a section with C=0, B=0 there is only one outstanding load from the memory. Thus,
the load performance for a memory page with C=0, B=1 is significantly better compared to a
memory page with C=0, B=0.
2.2.5
Multiply/Multiply Accumulate (MAC) Pipeline
The multiply-accumulate (MAC) unit executes the multiply and multiply-accumulate instructions
supported by the Intel XScale® Microarchitecture. The MAC implements the 40-bit Intel XScale®
Microarchitecture accumulator register acc0 and handles the instructions which transfers its value
to and from general-purpose ARM* registers.
Содержание PXA270
Страница 1: ...Order Number 280004 001 Intel PXA27x Processor Family Optimization Guide April 2004...
Страница 10: ...x Intel PXA27x Processor Family Optimization Guide Contents...
Страница 20: ...1 10 Intel PXA27x Processor Family Optimization Guide Introduction...
Страница 30: ...2 10 Intel PXA27x Processor Family Optimization Guide Microarchitecture Overview...
Страница 48: ...3 18 Intel PXA27x Processor Family Optimization Guide System Level Optimization...
Страница 114: ...5 16 Intel PXA27x Processor Family Optimization Guide High Level Language Optimization...
Страница 122: ...6 8 Intel PXA27x Processor Family Optimization Guide Power Optimization...
Страница 143: ...Intel PXA27x Processor Family Optimization Guide Index 5 Index...
Страница 144: ......