3.7
Optimizing the Pipeline
www.ti.com
Optimizing the Pipeline
The following example shows how delay slots can be used to improve the performance of an algorithm.
The example performs two Y = MX+B operations. In
, no optimization has been done. The
Y = MX+B calculations are sequential and each takes 7 cycles to complete. Notice there are NOPs in the
delay slots that could be filled with non-conflicting instructions. The only requirement is these instructions
must not cause a register conflict or access the STF register flags.
Example 3-14. Floating-Point Code Without Pipeline Optimization
; Using NOPs for alignment cycles, calculate the following:
;
; Y1 = M1*X1 + B1
; Y2 = M2*X2 + B2
;
; Calculate Y1
;
MOV32
R0H,@M1
; Load R0H with M1 - single cycle
MOV32
R1H,@X1
; Load R1H with X1 - single cycle
MPYF32
R1H,R1H,R0H
; R1H = M1 * X1
- 2p operation
|| MOV32
R0H,@B1
; Load R0H with B1 - single cycle
NOP
; Wait for MPYF32 to complete
; <-- MPYF32 completes, R1H is valid
ADDF32
R1H,R1H,R0H
; R1H = R1H + R0H
- 2p operation
NOP
; Wait for ADDF32 to complete
; <-- ADDF32 completes, R1H is valid
MOV32
@Y1,R1H
; Save R1H in Y1
- single cycle
; Calculate Y2
MOV32
R0H,@M2
; Load R0H with M2 - single cycle
MOV32
R1H,@X2
; Load R1H with X2 - single cycle
MPYF32
R1H,R1H,R0H
; R1H = M2 * X2
- 2p operation
|| MOV32
R0H,@B2
; Load R0H with B2 - single cycle
NOP
; Wait for MPYF32 to complete
; <-- MPYF32 completes, R1H is valid
ADDF32
R1H,R1H,R0H
; R1H = R1H + R0H
NOP
; Wait for ADDF32 to complete
; <-- ADDF32 completes, R1H is valid
MOV32
@Y2,R1H
; Save R1H in Y2
; 14 cycles
; 48 bytes
The code shown in
was generated by the C28x+FPU compiler with optimization enabled.
Notice that the NOPs in the first example have now been filled with other instructions. The code for the
two Y = MX+B calculations are now interleaved and both calculations complete in only 9 cycles.
SPRUEO2A – June 2007 – Revised August 2008
Pipeline
27
Summary of Contents for TMS320C28 series
Page 2: ...2 SPRUEO2A June 2007 Revised August 2008 Submit Documentation Feedback ...
Page 12: ...Introduction 12 SPRUEO2A June 2007 Revised August 2008 Submit Documentation Feedback ...
Page 20: ...CPU Register Set 20 SPRUEO2A June 2007 Revised August 2008 Submit Documentation Feedback ...
Page 136: ...Instruction Set 136 SPRUEO2A June 2007 Revised August 2008 Submit Documentation Feedback ...