Using Word Access for Short Data and Doubleword Access for Floating-Point Data
6-28
6.4.6
Comparing Performance
Executing the fixed-point dot product with the optimizations in Example 6–19
requires only 50 iterations, because you operate in parallel on both the even
and odd array elements. With the setup code and the final ADD instruction, 100
iterations of this loop require a total of 402 cycles (1 + 8
50 + 1).
Table 6–3 compares the performance of the different versions of the fixed-
point dot product code discussed so far.
Table 6–3. Comparison of Fixed-Point Dot Product Code With Use of LDW
Code Example
100 Iterations
Cycle Count
Example 6–9
Fixed-point dot product nonparallel assembly
2 + 100
16
1602
Example 6–10
Fixed-point dot product parallel assembly
1 + 100
8
801
Example 6–19
Fixed-point dot product parallel assembly with LDW
1 + (50
8)+ 1
402
Executing the floating-point dot product with the optimizations in
Example 6–20 requires only 50 iterations, because you operate in parallel on
both the even and odd array elements. With the setup code and the final
ADDSP instruction, 100 iterations of this loop require a total of 508 cycles (1
+ 10
50 + 7).
Table 6–4 compares the performance of the different versions of the floating-
point dot product code discussed so far.
Table 6–4. Comparison of Floating-Point Dot Product Code With Use of LDDW
Code Example
100 Iterations
Cycle Count
Example 6–11
Floating-point dot product nonparallel assembly
2 + 100
21
2102
Example 6–12
Floating-point dot product parallel assembly
1 + 100
10
1001
Example 6–20
Floating-point dot product parallel assembly with LDDW
1 + (50
10)+ 7
508