Using Word Access for Short Data and Doubleword Access for Floating-Point Data
6-27
Optimizing Assembly Code via Linear Assembly
6.4.5.2
Floating-Point Dot Product
Example 6–20 uses LDDW instructions instead of LDW instructions.
Example 6–20. Assembly Code for Floating-Point Dot Product With LDDW
(Before Software Pipelining)
MVK
.S1
50,A1
; set up loop counter
||
ZERO
.L1
A7
; zero out sum0 accumulator
||
ZERO
.L2
B7
; zero out sum1 accumulator
LOOP:
LDDW
.D1
*A4++,A2
; load ai & ai+1 from memory
||
LDDW
.D2
*B4++,B2
; load bi & bi+1 from memory
SUB
.S1
A1,1,A1
; decrement loop counter
NOP
2
[A1]
B
.S1
LOOP
; branch to loop
MPYSP
.M1X
A2,B2,A6
; ai * bi
||
MPYSP
.M2X
A3,B3,B6
; ai+1 * bi+1
NOP
3
ADDSP
.L1
A6,A7,A7
; sum0 += (ai * bi)
||
ADDSP
.L2
B6,B7,B7
; sum1 += (ai+1 * bi+1)
; Branch occurs here
NOP
3
ADDSP
.L1X
A7,B7,A4
; sum = sum0 + sum1
NOP
3
The code in Example 6–20 includes the following optimizations:
-
The setup code for the loop is included to initialize the array pointers and
the loop counter and to clear the accumulators. The setup code assumes
that A4 and B4 have been initialized to point to arrays
a and b, respectively.
-
The MVK instruction initializes the loop counter.
-
The two ZERO instructions, which execute in parallel, initialize the even
and odd accumulators (sum0 and sum1) to 0.
-
The third ADDSP instruction adds the even and odd accumulators.