PowerPC e500 Core Family Reference Manual, Rev. 1
4-44
Freescale Semiconductor
Execution Timing
4.7
Instruction Scheduling Guidelines
This section provides an overview of instruction scheduling guidelines, followed by detailed
examples showing how to optimize scheduling with respect to various pipeline stages.
Performance can be improved by avoiding resource conflicts and scheduling instructions to take
fullest advantage of the parallel execution units. Instruction scheduling can be improved by
observing the following guidelines:
•
To reduce branch mispredictions, separate the instruction that sets CR bits from the branch
instruction that evaluates them. Because there can be no more than 26 instructions in the
processor (with the instruction that sets CR in CQ0 and the dependent branch instruction in
IQ11), there is no advantage to having more than 24 instructions between them.
•
When branching to a location specified by the CTR or LR, separate the mtspr instruction
that initializes the CTR or LR from the dependent branch instruction. This ensures the
register values are immediately available to the branch instruction.
•
Schedule instructions so two can be dispatched at a time.
•
Schedule instructions to minimize stalls due to busy execution units.
•
Avoid scheduling high-latency instructions close together. Interspersing single-cycle
latency instructions between longer-latency instructions minimizes the effect that
instructions such as integer divide can have on throughput.
•
Avoid using serializing instructions.
•
Schedule instructions to avoid dispatch stalls. As many as 14 instructions can be assigned
CR and GPR renames and can be assigned CQ entries; therefore, 14 instructions can be in
the execute stages at any one time. (However, note the exception of load or store with
update instructions, which are broken into two instructions at dispatch.)
•
Avoid branches where possible; favor not-taken branches over taken branches.
The following sections give detailed information on optimizing code for e500 pipeline stages.
evsubfusiaaw
MU
4:1
evsubfw
SU1
1
evsubifw
SU1
1
evxor
SU1
1
1
The MU bypass path allows divide instructions to perform the iterative operations necessary for division without blocking the
MU pipeline (except to other divide instructions). Therefore, multiply instructions than follow a divide instruction can finish
execution ahead of the divide. See
Section 4.4.3, “Simple and Multiple Unit Execution
.”
Table 4-8. SPE and Embedded Floating-Point APU Instruction Latencies (continued)
Mnemonic
Unit
Cycles (Latency:Throughput)
Summary of Contents for PowerPC e500 Core
Page 1: ...PowerPC e500 Core Family Reference Manual Supports e500v1 e500v2 E500CORERM Rev 1 4 2005...
Page 36: ...PowerPC e500 Core Family Reference Manual Rev 1 xxxvi Freescale Semiconductor...
Page 38: ...PowerPC e500 Core Family Reference Manual Rev 1 Part I 2 Freescale Semiconductor...
Page 332: ...PowerPC e500 Core Family Reference Manual Rev 1 Part II 2 Freescale Semiconductor...
Page 530: ...Opcode Listings PowerPC e500 Core Family Reference Manual Rev 1 D 50 Freescale Semiconductor...
Page 534: ...PowerPC e500 Core Family Reference Manual Rev 1 E 4 Freescale Semiconductor Revision History...