Lesson 1: Loop Carry Path From Memory Pointers
2-7
Compiler Optimization Tutorial
A schedule with ii = 10, implies that each iteration of the loop takes ten cycles.
Obviously, with eight resources available every cycle on such a small loop, we
would expect this loop to do better than this.
Q Where are the problems with this loop?
A A closer look at the feedback in lesson_c.asm gives us the answer.
Q Why did the loop start searching for a software pipeline at ii=10 (for a
10–cycle loop)?
A The first iteration interval attempted by the compiler is always the maximum
of the Loop Carried Dependency Bound and the Partitioned Resource Bound.
In such a case, the compiler thinks there is a loop carry path equal to ten
cycles:
;* Loop Carried Dependency Bound(^) : 10
The ^ symbol is interspersed in the assembly output in the comments of each
instruction in the loop carry path, and is visible in lesson_c.asm.
Example 2–4. lesson_c.asm
L2: ; PIPED LOOP KERNEL
LDH .D1T1 *A4++,A0 ; ^ |32|
|| LDH .D2T2 *B4++,B6 ; ^ |32|
NOP 2
[ B0] SUB .L2 B0,1,B0 ; |33|
[ B0] B .S2 L2 ; |33|
MPY .M1 A0,A5,A0 ; ^ |32|
|| MPY .M2 B6,B5,B6 ; ^ |32|
NOP 1
ADD .L1X B6,A0,A0 ; ^ |32|
SHR .S1 A0,15,A0 ; ^ |32|
STH .D1T1 A0,*A3++ ; ^ |32|
You can also use a dependency graph to analyze feedback, for example:
Q Why is there a dependency between STH and LDH? They do not use any
common registers so how can there be a dependency?
A If we look at the original C code in lesson_c.c, we see that the LDHs corre-
spond to loading values from xptr and yptr, and the STH corresponds to storing
values into w_sum array.
Q Is there any dependency between xptr, yptr, and w_sum?