Using Word Access for Short Data and Doubleword Access for Floating-Point Data
6-19
Optimizing Assembly Code via Linear Assembly
6.4
Using Word Access for Short Data and Doubleword Access for
Floating-Point Data
The parallel code for the fixed-point example in section 6.3 uses an LDH
instruction to read a[i]. Because a[i] and
a[i + 1] are next to each other in
memory, you can optimize the code further by using the load word (LDW)
instruction to read a[i] and a[i + 1] at the same time and load both into a single
32-bit register. (The data must be word-aligned in memory.)
In the floating-point example, the parallel code uses an LDW instruction to read
a[i]. Because a[i] and
a[i + 1] are next to each other in memory, you can opti-
mize the code further by using the load doubleword (LDDW) instruction to read
a[i] and a[i + 1] at the same time and load both into a register pair. (The data
must be doubleword-aligned in memory.) See the
TMS320C6000 CPU and In-
struction Set User’s Guide for more specific information on the LDDW instruc-
tion.
Note:
The load doubleword (LDDW) instruction is available on the ’C64x (fixed
point) and ’C67x (floating-point) device.
6.4.1
Unrolled Dot Product C Code
The fixed-point C code in Example 6–13 has the effect of unrolling the loop by
accumulating the even elements,
a[i] and b[i], into sum0 and the odd elements,
a[i + 1] and
b[i + 1], into sum1. After the loop, sum0 and sum1 are added to pro-
duce the final sum. The same is true for the floating-point C code in
Example 6–14. (For another example of loop unrolling, see section 6.9 on
page 6-94.)
Example 6–13. Fixed-Point Dot Product C Code (Unrolled)
int dotp(short a[], short b[] )
{
int sum0, sum1, sum, i;
sum0 = 0;
sum1 = 0;
for(i=0; i<100; i+=2){
sum0 += a[i] * b[i];
sum1 += a[i + 1] * b[i + 1];
}
sum = sum0 + sum1;
return(sum);
}