Software Pipelining
6-41
Optimizing Assembly Code via Linear Assembly
6.5.3.1
Fixed-Point Example
Multiple branch instructions are in the pipe. The first branch in the fixed-point
dot product is issued on cycle 2 but does not actually branch until the end of
cycle 7 (after five delay slots). The branch target is the execute packet defined
by the label LOOP. On cycle 7, the first branch returns to the same execute
packet, resulting in a single-cycle loop. On every cycle after cycle 7, a branch
executes back to LOOP until the loop counter finally decrements to 0. Once
the loop counter is 0, five more branches execute because they are already
in the pipe.
Executing the dot product code with the software pipelining as shown in
Example 6–26 requires a total of 58 cycles (7 + 50 + 1), which is a significant
improvement over the 402 cycles required by the code in Example 6–19.
Note:
The code created by the assembly optimizer will not completely match the
final assembly code shown in this and future sections because different ver-
sions of the tool will produce slightly different code. However, the inner loop
performance (number of cycles per iteration) should be similar.