Using Word Access for Short Data and Doubleword Access for Floating-Point Data
6-21
Optimizing Assembly Code via Linear Assembly
Two MPY instructions are now necessary to multiply the second set of array
elements:
-
The first MPY instruction multiplies the 16 least significant bits (LSBs) in
each source register: a[i]
b[i].
-
The MPYH instruction multiplies the 16 most significant bits (MSBs) of
each source register: a[i+1]
b [i+1].
The two ADD instructions accumulate the sums of the even and odd elements:
sum0 and sum1.
Note:
This is true only when the ’C6x is in little-endian mode. In big-endian mode,
MPY operates on a[i+1] and b[i+1] and MPYH operates on a[i] and b[i]. See
the
TMS320C6000 Peripherals Reference Guide for more information.
6.4.2.2
Floating-Point Dot Product
Example 6–16 shows the list of ’C6x instructions that execute the unrolled
floating-point dot product loop. Symbolic variable names are used instead of
actual registers. Using symbolic names for data and pointers makes code eas-
ier to write and allows the optimizer to allocate registers. However, you must
use the .reg assembly optimizer directive. See the
TMS320C6000 Optimizing
C/C++ Compiler User’s Guide for more information on writing linear assembly
code.
Example 6–16. Linear Assembly for Floating-Point Dot Product Inner Loop with LDDW
LDDW
*a++,ai1:ai0
; load a[i+0] & a[i+1] from memory
LDDW
*b++,bi1:bi0
; load b[i+0] & b[i+1] from memory
MPYSP
ai0,bi0,pi0
; a[i+0] * b[i+0]
MPYSP
ai1,bi1,pi1
; a[i+1] * b[i+1]
ADDSP
pi0,sum0,sum0
; sum0 += (a[i+0] * b[i+0])
ADDSP
pi1,sum1,sum1
; sum1 += (a[i+1] * b[i+1])
[cntr] SUB
cntr,1,cntr
; decrement loop counter
[cntr] B
LOOP
; branch to loop
The two load doubleword (LDDW) instructions load a[i], a[i+1], b[i], and b[i+1]
on each iteration.
Two MPYSP instructions are now necessary to multiply the second set of array
elements.
The two ADDSP instructions accumulate the sums of the even and odd
elements: sum0 and sum1.