Software Pipelining
6-38
Table 6–9. Software Pipeline Accumulation Staggered Results Due to Three-Cycle
Delay
Cycle #
Pseudoinstruction
Current value of
pseudoregister sum
Written expected result
0
ADDSP x(0), sum, sum
0
; cycle 4 sum = x(0)
1
ADDSP x(1), sum, sum
0
; cycle 5 sum = x(1)
2
ADDSP x(2), sum, sum
0
; cycle 6 sum = x(2)
3
ADDSP x(3), sum, sum
0
; cycle 7 sum = x(3)
4
ADDSP x(4), sum, sum
x(0)
; cycle 8 sum = x(0) + x(4)
5
ADDSP x(5), sum, sum
x(1)
; cycle 9 sum = x(1) + x(5)
6
ADDSP x(6), sum, sum
x(6)
; cycle 10 sum = x(2) + x(6)
7
ADDSP x(7), sum, sum
x(7)
; cycle 11 sum = x(3) + x(7)
8
ADDSP x(8), sum, sum
x(0) + x(4)
; cycle 12 sum = x(0) + x(8)
S
S
S
i + j†
ADDSP x(i+j), sum, sum
x(j) + x(j+4) + x(j+8)
…
x(i–4+j)
; cycle i + j + 4 sum = x(j) + x(j+4) +
x(j+8)
…
x(i–4+j) + x(i+j)
S
S
S
† where i is a multiple of 4
The first value of the array x, x(0) is added to the accumulator (sum) on cycle
0, but the result is not ready until cycle 4. This means that on cycle 1 when x(1)
is added to the accumulator (sum), sum has no value in it from x(0). Thus,
when this result is ready on cycle 5, sum will have the value x(1) in it, instead
of the value x(0) + x(1). When you reach cycle 4, sum will have the value x(0)
in it and the value x(4) will be added to that, causing sum = x(0) + x(4) on
cycle 8. This is continuously repeated, resulting in four separate accumula-
tions (using the register “sum”).
The current value in the accumulator “sum” depends on which iteration is be-
ing done. After the completion of the loop, the last four sums should be written
into separate registers and then added together to give the final result. This
is shown in Example 6–27 on page 6-43.