Live-Too-Long Issues
6-108
6.10.6 Final Assembly With Move Instructions
Example 6–59 shows the final assembly code after software pipelining. The
performance of this loop is 212 cycles (2
100 + 11 + 1).
Example 6–59. Assembly Code for Live-Too-Long With Move Instructions
LDH
.D1
*A4++,A0
; load ai from memory
||
LDH
.D2
*B4++,B0
; load bi from memory
MVK
.S2
100,B2
; set up loop counter
LDH
.D1
*A4++,A0
;* load ai from memory
||
LDH
.D2
*B4++,B0
;* load bi from memory
ZERO
.S1
A1
; zero out accumulator
||
ZERO
.S2
B1
; zero out accumulator
LDH
.D1
*A4++,A0
;** load ai from memory
||
LDH
.D2
*B4++,B0
;** load bi from memory
[B2] SUB
.S2
B2,1,B2
; decrement loop counter
MPY
.M1
A0,A6,A3
; a0 = ai * c
||
MPY
.M2X
B0,A6,B10
; b0 = bi * c
||
LDH
.D1
*A4++,A0
;*** load ai from memory
||
LDH
.D2
*B4++,B0
;*** load bi from memory
[B2] SUB
.S2
B2,1,B2
; decrement loop counter
||[B2] B
.S1
LOOP
; branch to loop
SHR
.S1
A3,15,A5
; a1 = a0 >> 15
||
SHR
.S2
B10,15,B5
; b1 = b0 >> 15
||
MPY
.M1
A0,A6,A3
;* a0 = ai * c
||
MPY
.M2X
B0,A6,B10
;* b0 = bi * c
||
LDH
.D1
*A4++,A0
;**** load ai from memory
||
LDH
.D2
*B4++,B0
;**** load bi from memory
MPY
.M1X
A5,B6,A7
; a2 = a1 * d
||
MV
.D1
A3,A2
; save a0 across iterations
||
MPY
.M2X
B5,A8,B7
; b2 = b1 * e
||
MV
.D2
B10,B8
; save b0 across iterations
||[B2] SUB
.S2
B2,1,B2
;* decrement loop counter
||[B2] B
.S1
LOOP
;* branch to loop
SHR
.S1
A3,15,A5
;* a1 = a0 >> 15
||
SHR
.S2
B10,15,B5
;* b1 = b0 >> 15
||
MPY
.M1
A0,A6,A3
;** a0 = ai * c
||
MPY
.M2X
B0,A6,B10
;** b0 = bi * c
||
LDH
.D1
*A4++,A0
;***** load ai from memory
||
LDH
.D2
*B4++,B0
;***** load bi from memory