PowerPC e500 Core Family Reference Manual, Rev. 1
4-30
Freescale Semiconductor
Execution Timing
The execution of most load instructions is pipelined in the three LSU stages, during which the
effective address is calculated, MMU translations are performed, the data cache array and tags are
read, and cache way selection and data alignment are performed. Cacheable loads, when free of
data dependencies, execute in a speculative manner with a maximum throughput of one instruction
per cycle and 3-cycle latency. Data returned from the cache is held in a rename register until the
completion logic commits the value to the processor state.
Stores cannot be executed speculatively and must be held in the store queue until completion logic
signals that the store instruction is to be committed, at which point the data cache array is updated.
If operands are misaligned, additional latency may be incurred either for an alignment exception
or for additional cache or bus accesses.
Table 4-7
gives load and store instruction execution
latencies.
4.4.4.1
Effect of Operand Placement on Performance
The location and alignment of operands in memory may affect performance of memory accesses,
in some cases significantly, as shown in
Table 4-4
.
Alignment of memory operands on natural boundaries guarantees the best performance. For the
best performance across the widest range of implementations, the programmer should assume the
performance model described in
Section 3.1, “Operand Conventions
.”
The effect of alignment on memory operation performance is the same for big- and little-endian
addressing modes, including load-multiple and store-multiple operations.
In
Table 4-4
, optimal means that one effective address (EA) calculation occurs during the memory
operation. Fair means that multiple EA calculations occur during the operation, which may cause
additional cache or bus activities with multiple transfers. Poor means that an alignment interrupt
is generated by the memory operation.
4.5
Memory Performance Considerations
Because the e500 has a maximum instruction throughput of two instructions per clock cycle, lack
of memory bandwidth can affect performance. To maximize performance, the e500 must be able
to read and write data efficiently. If a system has multiple bus devices, one device may experience
long memory latencies while another device (for example, a direct-memory access controller) is
using the external bus.
Summary of Contents for PowerPC e500 Core
Page 1: ...PowerPC e500 Core Family Reference Manual Supports e500v1 e500v2 E500CORERM Rev 1 4 2005...
Page 36: ...PowerPC e500 Core Family Reference Manual Rev 1 xxxvi Freescale Semiconductor...
Page 38: ...PowerPC e500 Core Family Reference Manual Rev 1 Part I 2 Freescale Semiconductor...
Page 332: ...PowerPC e500 Core Family Reference Manual Rev 1 Part II 2 Freescale Semiconductor...
Page 530: ...Opcode Listings PowerPC e500 Core Family Reference Manual Rev 1 D 50 Freescale Semiconductor...
Page 534: ...PowerPC e500 Core Family Reference Manual Rev 1 E 4 Freescale Semiconductor Revision History...