3.1
Pipeline Overview
Fetch
C28x pipeline
Decode
Read
Exe
W
Write
FPU instruction
Store
Load
CMP/MIN/MAX/NEG/ABS
MPY/ADD/SUB/MACF32
E
R2
R1
D2
D1
F2
F1
E2
W
E1
R
D
3.2
General Guidelines for Floating-Point Pipeline Alignment
Pipeline Overview
www.ti.com
The C28x FPU pipeline is identical to the C28x pipeline for all standard C28x instructions. In the decode2
stage (D2), it is determined if an instruction is a C28x instruction or a floating-point unit instruction. The
pipeline flow is shown in
. Notice that stalls due to normal C28x pipeline stalls (D2) and memory
waitstates (R2 and W) will also stall any C28x FPU instruction. Most C28x FPU instructions are single
cycle and will complete in the FPU E1 or W stage which aligns to the C28x pipeline. Some instructions will
take an additional execute cycle (E2). For these instructions you must wait a cycle for the result from the
instruction to be available. The rest of this section will describe when delay cycles are required. Keep in
mind that the assembly tools for the C28x+FPU will issue an error if a delay slot has not been handled
correctly.
Figure 3-1. FPU Pipeline
While the C28x+FPU assembler will issue errors for pipeline conflicts, you may still find it useful to
understand when software delays are required. This section describes three guidelines you can follow
when writing C28x+FPU assembly code.
Floating-point instructions that require delay slots have a 'p' after their cycle count. For example '2p'
stands for 2 pipelined cycles. This means that an instruction can be started every cycle, but the result of
the instruction will only be valid one instruction later.
There are three general guidelines to determine if an instruction needs a delay slot:
1. Floating-point math operations (multiply, addition, subtraction, 1/x and MAC) require 1 delay slot.
2. Conversion instructions between integer and floating-point formats require 1 delay slot.
3. Everything else does not require a delay slot. This includes minimum, maximum, compare, load, store,
negative and absolute value instructions.
There are two exceptions to these rules. First, moves between the CPU and FPU registers require special
pipeline alignment that is described later in this section. These operations are typically infrequent. Second,
the MACF32 R7H, R3H, mem32, *XAR7 instruction has special requirements that make it easier to use.
Refer to the MACF32 instruction description for details.
An example of the 32-bit ADDF32 instruction is shown in
. ADDF32 is a 2p instruction and
therefore requires one delay slot. The destination register for the operation, R0H, will be updated one
cycle after the instruction for a total of 2 cycles. Therefore, a NOP or instruction that does not use R0H
must follow this instruction.
Any memory stall or pipeline stall will also stall the floating-point unit. This keeps the floating-point unit
aligned with the C28x pipeline and there is no need to change the code based on the waitstates of a
memory block.
22
Pipeline
SPRUEO2A – June 2007 – Revised August 2008
Summary of Contents for TMS320C28 series
Page 2: ...2 SPRUEO2A June 2007 Revised August 2008 Submit Documentation Feedback ...
Page 12: ...Introduction 12 SPRUEO2A June 2007 Revised August 2008 Submit Documentation Feedback ...
Page 20: ...CPU Register Set 20 SPRUEO2A June 2007 Revised August 2008 Submit Documentation Feedback ...
Page 136: ...Instruction Set 136 SPRUEO2A June 2007 Revised August 2008 Submit Documentation Feedback ...