Using Word Access for Short Data and Doubleword Access for Floating-Point Data
6-25
Optimizing Assembly Code via Linear Assembly
Figure 6–8. Dependency Graph of Floating-Point Dot Product With LDDW (Showing
Functional Units)
pi+1
4
ADDSP
sum1
4
.M2X
.L2
B side
A side
LDDW
LDDW
bi & bi+1
ai & ai+1
pi
5
5
5
5
4
ADDSP
SUB
sum0
cntr
LOOP
4
B
MPYSP
MPYSP
1
1
.D1
.D2
.M1X
.L1
.S1
.S2
Example 6–18. Linear Assembly for Floating-Point Dot Product Inner Loop With LDDW
(With Allocated Resources)
LDDW
.D1
*A4++,A3:A2
; load ai and ai+1 from memory
LDDW
.D2
*B4++,B3:B2
; load bi and bi+1 from memory
MPYSP
.M1X
A2,B2,A6
; ai * bi
MPYSP
.M2X
A3,B3,B6
; ai+1 * bi+1
ADDSP
.L1
A6,A7,A7
; sum0 += (ai * bi)
ADDSP
.L2
B6,B7,B7
; sum1 += (ai+1 * bi+1)
SUB
.S1
A1,1,A1
; decrement loop counter
[A1] B
.S2
LOOP
; branch to loop