Linear Assembly Considerations
8-51
’C64x Programming Considerations
It is possible to avoid the cross path stall by scheduling instructions such that
a cross path operand is not read until at least one clock cycle after the operand
has been updated. With appropriate scheduling, the ’C64x can provide one
cross path operand per data path per clock cycle with no stalls. In many cases,
the TMS320C6000 Optimizing C Compiler and Assembly Optimizer automati-
cally perform this scheduling as demonstrated in Example 8–24.
Below is a C implementation of a weighted vector sum. Each value of input
array a is multiplied by a constant, m, and then is shifted to the right by 15 bits.
This weighted input is now added to a second input array, b, with the weighted
sum stored in output array, c.
Example 8–24. Avoiding Cross Path Stalls: Weighted Vector Sum Example
int w_vec(short a[],short b[], short c[], short m, int n)
{int i;
for (i=0; i<n; i++) {
c[i] = ((m * a[i]) >> 15) + b[i];
}
}
This algorithm requires two loads, a multiply, a shift, an add, and a store. Only
the .D units on the C6000 architecture are capable of loading/storing values
from/to memory. Since there are two .D units available, it would appear this
algorithm would require two cycles to produce one result considering three .D
operations are required. Be aware, however, that the input and output arrays
are short or 16–bit values. Both the ’C62x and ’C64x have the ability to load/
store 32–bits per .D unit. (The ’C64x is able load/store 64–bits per .D unit as
well.). By unrolling the loop once, it may be possible to produce two 16–bit re-
sults every two clock cycles.
Now, examine further a partitioned linear assembly version of the weighted
vector sum, where data values are brought in 32–bits at a time. With linear as-
sembly, it is not necessary to specify registers, functional units or delay slots.
In partitioned linear assembly, the programmer has the option to specify on
what side of the machine the instructions will execute. We can further specify
the functional unit as seen below in Example 8–25.