Using Word Access for Short Data and Doubleword Access for Floating-Point Data
6-26
6.4.5
Final Assembly
Example 6–19 shows the final assembly code for the unrolled loop of the fixed-
point dot product and Example 6–20 shows the final assembly code for the
unrolled loop of the floating-point dot product.
6.4.5.1
Fixed-Point Dot Product
Example 6–19 uses LDW instructions instead of LDH instructions.
Example 6–19. Assembly Code for Fixed-Point Dot Product With LDW
(Before Software Pipelining)
MVK
.S1
50,A1
; set up loop counter
||
ZERO
.L1
A7
; zero out sum0 accumulator
||
ZERO
.L2
B7
; zero out sum1 accumulator
LOOP:
LDW
.D1
*A4++,A2
; load ai & ai+1 from memory
||
LDW
.D2
*B4++,B2
; load bi & bi+1 from memory
SUB
.S1
A1,1,A1
; decrement loop counter
[A1]
B
.S1
LOOP
; branch to loop
NOP
2
MPY
.M1X
A2,B2,A6
; ai * bi
||
MPYH
.M2X
A2,B2,B6
; ai+1 * bi+1
NOP
ADD
.L1
A6,A7,A7
; sum0+= (ai * bi)
||
ADD
.L2
B6,B7,B7
; sum1+= (ai+1 * bi+1)
; Branch occurs here
ADD
.L1X
A7,B7,A4
; sum = sum0 + sum1
The code in Example 6–19 includes the following optimizations:
-
The setup code for the loop is included to initialize the array pointers and
the loop counter and to clear the accumulators. The setup code assumes
that A4 and B4 have been initialized to point to arrays
a and b, respectively.
-
The MVK instruction initializes the loop counter.
-
The two ZERO instructions, which execute in parallel, initialize the even
and odd accumulators (sum0 and sum1) to 0.
-
The third ADD instruction adds the even and odd accumulators.