Outer Loop Conditionally Executed With Inner Loop
6-149
Optimizing Assembly Code via Linear Assembly
Example 6–78. Final Assembly Code for FIR Filter (Continued)
ADD
.L2X
A9,B8,B11
; sum1 += p17
||
ADD
.L1X
B11,A12,A12
; sum0 += p06
||
MPY
.M1
A8,A10,A7
;* p00 = h[i+0]*x[j+i+0]
||
MPYLH
.M2
B7,B9,B13
;* p12 = h[i+2]*x[j+i+3]
||[A2]
SUB
.S1
A2,1,A2
;* dec store lp cntr
ADD
.L1X
B13,A12,A10
; sum0 += p07
||[!A2]
SHR
.S2
B11,15,B11
;* (Bsum1 >> 15)
||
MPY
.M2
B7,B9,B9
;* p02 = h[i+2]*x[j+i+2]
||
MPYH
.M1
A8,A10,A10
;* p01 = h[i+1]*x[j+i+1]
||[A2]
ADD
.L2
B4,B11,B4
;* sum1(p10) = p10 + sum1
||
LDW
.D1
*A4++[2],B9
;** x[j+i+2] & x[j+i+3]
||
LDW
.D2
*B1++[2],A10
;** x[j+i+0] & x[j+i+1]
;Branch occurs here
[!A2]
SHR
.S1
A10,15,A12
; (Asum0 >> 15)
[!A2]
STH
.D2
B11,*B6++[2]
; y[j+1] = (Bsum1 >> 15)
||[!A2]
STH
.D1
A12,*A6++[2]
; y[j] = (Asum0 >> 15)
6.14.9 Comparing Performance
The cycle count of this code is 1612: 50 (8
4 + 0) + 12. The overhead due
to the outer loop has been completely eliminated.
Table 6–28. Comparison of FIR Filter Code
Code Example
Cycles
Cycle Count
Example 6–61
FIR with redundant load elimination
50 (16
2 + 9 + 6) + 2
2352
Example 6–69
FIR with redundant load elimination and no memory
hits
50 (8
4 + 10 + 6) + 2
2402
Example 6–71
FIR with redundant load elimination and no memory
hits with outer loop software-pipelined
50 (7
4 + 6 + 6) + 6
2006
Example 6–74
FIR with redundant load elimination and no memory
hits with outer loop conditionally executed with inner
loop
50 (8
4 + 0) + 12
1612