Writing Parallel Code
6-10
6.3.2
Translating C Code to Linear Assembly
The first step in optimizing your code is to translate the C code to linear assem-
bly.
6.3.2.1
Fixed-Point Dot Product
Example 6–7 shows the linear assembly instructions used for the inner loop
of the fixed-point dot product C code.
Example 6–7. List of Assembly Instructions for Fixed-Point Dot Product
LDH
.D1
*A4++,A2
; load ai from memory
LDH
.D1
*A3++,A5
; load bi from memory
MPY
.M1
A2,A5,A6
; ai * bi
ADD
.L1
A6,A7,A7
; sum += (ai * bi)
SUB
.S1
A1,1,A1
; decrement loop counter
[A1]
B
.S2
LOOP
; branch to loop
The load halfword (LDH) instructions increment through the
a and b arrays.
Each LDH does a postincrement on the pointer. Each iteration of these instruc-
tions sets the pointer to the next halfword (16 bits) in the array. The ADD in-
struction accumulates the total of the results from the multiply (MPY) instruc-
tion. The subtract (SUB) instruction decrements the loop counter.
An additional instruction is included to execute the branch back to the top of
the loop. The branch (B) instruction is conditional on the loop counter, A1, and
executes only until A1 is 0.
6.3.2.2
Floating-Point Dot Product
Example 6–8 shows the linear assembly instructions used for the inner loop
of the floating-point dot product C code.
Example 6–8. List of Assembly Instructions for Floating-Point Dot Product
LDW
.D1
*A4++,A2
; load ai from memory
LDW
.D2
*A3++,A5
; load bi from memory
MPYSP
†
.M1
A2,A5,A6
; ai * bi
ADDSP
†
.L1
A6,A7,A7
; sum += (ai * bi)
SUB
.S1
A1,1,A1
; decrement loop counter
[A1]
B
.S2
LOOP
; branch to loop
† ADDSP and MPYSP are ’C67x (floating-point) instructions only.
The load word (LDW) instructions increment through the
a and b arrays. Each
LDW does a postincrement on the pointer. Each iteration of these instructions
sets the pointer to the next word (32 bits) in the array. The ADDSP instruction