Modulo Scheduling of Multicycle Loops
6-73
Optimizing Assembly Code via Linear Assembly
6.6.7
Using the Assembly Optimizer for the Weighted Vector Sum
Example 6–39 shows the linear assembly code to perform the weighted vector
sum. You can use this code as input to the assembly optimizer to create a soft-
ware-pipelined loop instead of scheduling this by hand.
Example 6–39. Linear Assembly for Weighted Vector Sum
.global _w_vec
_w_vec: .cproc
a, b, c, m
.reg
ai_i1, bi_i1, pi, pi1, pi_i1, pi_s, pi1_s
.reg
mask, bi, bi1, ci, ci1, c1, cntr
MVK
–1,mask
; set to all 1s to create 0xFFFFFFFF
MVKH
0,mask
; clear upper 16 bits to create 0xFFFF
MVK
50,cntr
; cntr = 100/2
ADD
2,c,c1
; point to c[1]
LOOP:
.trip 50
LDW
.D2
*a++,ai_i1
; ai & ai+1
LDW
.D1
*b++,bi_i1
; bi & bi+1
MPY
.M1
ai_i1,m,pi
; m * ai
MPYHL
.M2
ai_i1,m,pi1
; m * ai+1
SHR
.S1
pi,15,pi_s
; (m * ai) >> 15
SHR
.S2
pi1,15,pi1_s
; (m * ai+1) >> 15
AND
.L2X
bi_i1,mask,bi ; bi
SHR
.S2
bi_i1,16,bi1
; bi+1
ADD
.L1X
pi_s,bi,ci
; ci = (m * ai) >> 15 + bi
ADD
.L2X
pi1_s,bi1,ci1 ; ci+1 = (m * ai+1) >> 15 + bi+1
STH
.D2
ci,*c++[2]
; store ci
STH
.D1
ci1,*c1++[2]
; store ci+1
[cntr]
SUB
cntr,1,cntr
; decrement loop counter
[cntr]
B
LOOP
; branch to loop
.endproc