Software Pipelining
6-39
Optimizing Assembly Code via Linear Assembly
6.5.2
Using the Assembly Optimizer to Create Optimized Loops
Example 6–24 shows the linear assembly code for the full fixed-point dot prod-
uct loop. Example 6–25 shows the linear assembly code for the full floating-
point dot product loop. You can use this code as input to the assembly optimiz-
er tool to create software-pipelined loops automatically. See the
TMS320C6000 Optimizing C/C++ Compiler User’s Guide for more informa-
tion on the assembly optimizer.
Example 6–24. Linear Assembly for Full Fixed-Point Dot Product
.global _dotp
_dotp: .cproc a, b
.reg
sum, sum0, sum1, cntr
.reg
ai_i1, bi_i1, pi, pi1
MVK
50,cntr
; cntr = 100/2
ZERO
sum0
; multiply result = 0
ZERO
sum1
; multiply result = 0
LOOP:
.trip 50
LDW
*a++,ai_i1
; load ai & ai+1 from memory
LDW
*b++,bi_i1
; load bi & bi+1 from memory
MPY
ai_i1,bi_i1,pi
; ai * bi
MPYH
ai_i1,bi_i1,pi1 ; ai+1 * bi+1
ADD
pi,sum0,sum0
; sum0 += (ai * bi)
ADD
pi1,sum1,sum1
; sum1 += (ai+1 * bi+1)
[cntr]
SUB
cntr,1,cntr
; decrement loop counter
[cntr]
B
LOOP
; branch to loop
ADD
sum0,sum1,sum
; compute final result
.return sum
.endproc
Resources such as functional units and 1X and 2X cross paths do not have
to be specified because these can be allocated automatically by the assembly
optimizer.