Software Pipelining
6-57
Optimizing Assembly Code via Linear Assembly
6.5.4
Comparing Performance
Table 6–10 compares the performance of all versions of the fixed-point dot
product code. Table 6–11 compares the performance of all versions of the
floating-point dot product code.
Table 6–10. Comparison of Fixed-Point Dot Product Code Examples
Code Example
100 Iterations
Cycle Count
Example 6–9
Fixed-point dot product linear assembly
2 + 100
16
1602
Example 6–10
Fixed-point dot product parallel assembly
1 + 100
8
801
Example 6–19
Fixed-point dot product parallel assembly with LDW
1 + (50
8) + 1
402
Example 6–26
Fixed-point software-pipelined dot product
7 + 50 + 1
58
Example 6–28
Fixed-point software-pipelined dot product with no extrane-
ous loads
7 + 43 + 7 + 1
58
Example 6–30
Fixed-point software-pipelined dot product with no prolog or
epilog
7 + 57 + 1
65
Example 6–32
Fixed-point software-pipelined dot product with smallest
code size
5 + 57 + 1
63
Table 6–11.
Comparison of Floating-Point Dot Product Code Examples
Code Example
100 Iterations
Cycle Count
Example 6–11
Floating-point dot product nonparallel assembly
2 + 100
21
2102
Example 6–12
Floating-point dot product parallel assembly
1 + 100
10
1001
Example 6–20
Floating-point dot product parallel assembly with LDDW
1 + (50
10) + 7
508
Example 6–27
Floating-point software-pipelined dot product
9 + 50 + 15
74
Example 6–29
Floating-point software-pipelined dot product with no extra-
neous loads
9 + 41 + 9 + 15
74
Example 6–31
Floating-point software-pipelined dot product with no prolog
or epilog
7 + 59 + 15
81
Example 6–33
Floating-point software-pipelined dot product with small-
est code size
5 + 59 + 15
79