6-6
Optimizing DSP56300/DSP56600 Applications
MOTOROLA
Pipeline Interlocks
Data ALU Pipeline Interlocks
;previous data
move
x:(r0)+,bb,y:(r4)+
;write destination memory,
;read next data
_end
move
a,y:(r4)+
;write last-1 word to
;destination memory
move
b,y:(r4)+
;write last word to destination
;memory
6.1.2.3
Saving Interlocks by Using the TFR Instruction.
The following C code adds a constant to two memory arrays, one in
X memory space and the other in Y memory space:
static int a[N],b[N];
int i;
for (i=0;i<N;i++)
{
b[i] = b[i]+c;}
for (i=0;i<N;i++)
{
a[i] = a[i]+c;}
The straightforward implementation of the code will execute in 8N
cycles:
move
var_a,r4
;a array in Y:memory space
move
var_b,r0
;b array in X:memory space
move
var_c,x0
;constant to add
do
#N,_1Loop
;handle Y array
move
y:(r4),a
;read data word
add
x0,a
;add constant
move
a,y:(r4)+
;store result and increment pointer
_1Loop
do
#N,_2Loop
;handle X array
move
x:(r0),a
;read data word
add
x0,a
;add constant
move
a,x:(r0)+
;store result and increment
;pointer
_2Loop
By combining the two loops into one and using the TFR instruction,
an optimized implementation takes only 1.5 cycles for main loop
iteration summing up to 3N cycles for the whole task:
move
var_a,r4
;a array in Y memory
move
var_b,r0
;b array in X memory
lua
(r4)+,r5
;r5 = r4 + 1
lua
(r0)+,r1
;r1 = r0 + 1
move
var_c,x1
move
x:(r0),b
add
x1,b
x:(r1)+,x0
y:(r4),a
do
#N,_3Loop
add
x1,a
b,x:(r0)+
x0,b
add
x1,b
y:(r5)+,y1
tfr
y1,a
x:(r1)+,x0
a,y:(r4)+
_3Loop