What Kind of Optimization Is Being Performed?
3-16
Example 3
−
1. Repeat Blocks, Autoincrement Addressing, Parallel Instructions, Strength
Reduction, Induction Variable Elimination, Register Variables, and Loop
Test Replacement
int a[10], b[10];
scale(int k)
{
int i;
for (i = 0; i < 10; ++i)
a[i] = b[i] * k;
. . .
TMS320C2x/C2xx/C5x C Compiler Output:
_scale:
. . .
LRLK
AR6,_a
; AR6 = &a[0]
LRLK
AR5,_b
; AR5 = &b[0]
LACK
9
SAMM
BRCR
; BRCR = 9
LARK
AR2,−3+LF1
; AR2 = &k
MAR
*0+,AR5
RPTB
L4−1
; repeat block 10 times
LT
*+,AR2
; t = *AR5++
MPY
* ,AR6
; p = t * *AR2
SPL
*+,AR5
; *AR6++ = p
L4:
. . .
Induction variable elimination and loop test replacement allow the compiler to rec-
ognize the loop as a simple counting loop and then generate a repeat block.
Strength reduction turns the array references into efficient pointer autoincrements.
3.8.4 Delays, Banches, Calls, and Returns
The TMS320C5x provides a number of of delayed branch, call, and return
instructions. Three of these are used by the compiler: branch unconditional
(BD), call to a named function (CALLD), and simple return (RETD). These
instructions execute in two fewer cycles than their nondelayed counterparts.
They execute two instructions words after they enter the instruction stream.
Sometimes it is necessary to insert a NOP after a delayed instruction to ensure
proper operation of the sequence. This is one word of code longer than a
nondelayed sequence, but it is still one cycle faster. Note that the compiler
emits a comment in the instruction sequence where the delayed instruction
executes. See Example 3
Summary of Contents for TMS320C2x
Page 8: ...viii...
Page 69: ...2 47 C Compiler Description...
Page 159: ...6 36...
Page 226: ...8 6...