Software Pipelining
6-31
Optimizing Assembly Code via Linear Assembly
Figure 6–10. Dependency Graph of Floating-Point Dot Product With LDDW
(Showing Functional Units)
B side
A side
LDDW
LDDW
bi & bi+1
ai & ai+1
pi+1
pi
5
5
5
5
4
4
ADDSP
ADDSP
SUB
sum0
sum1
cntr
LOOP
4
4
B
MPYSP
MPYSP
1
1
.D1
.D2
.M2X
.L2
.M1X
.L1
.S1
.S2
Example 6–22. Linear Assembly for Floating-Point Dot Product Inner Loop
(With Conditional SUB Instruction)
LDDW
.D1
*A4++,A2
; load ai and ai+1 from memory
LDDW
.D2
*B4++,B2
; load bi and bi+1 from memory
MPYSP
.M1X
A2,B2,A6
; ai * bi
MPYSP
.M2X
A2,B2,B6
; ai+1 * bi+1
ADDSP
.L1
A6,A7,A7
; sum0 += (ai * bi)
ADDSP
.L2
B6,B7,B7
; sum1 += (ai+1 * bi+1)
[A1] SUB
.S1
A1,1,A1
; decrement loop counter
[A1] B
.S2
LOOP
; branch to top of loop