Assembly Code
6-2
6.1
Assembly Code
The source that you write for the assembly optimizer is similar to assembly
source code; however, linear assembly does not include information about
parallel instructions, instruction latencies, or register usage. The assembly op-
timizer takes care of the difficulties of streamlining your code by:
-
Finding instructions that can be executed in parallel
-
Handling pipeline latencies during software pipelining
-
Assigning register usage
-
Defining which unit to use
Although you have the option with the ’C6000 to specify the functional unit or
register used, this may restrict the compiler’s ability to fully optimize your code.
See the
TMS320C6000 Optimizing C/C++ Compiler User’s Guide for more in-
formation.
This chapter takes you through the optimization process manually to show you
how the assembly optimizer works and to help you understand when you might
want to perform some of the optimizations manually. Each section introduces
optimization techniques in increasing complexity:
-
Section 6.3 and section 6.4 begin with a dot product algorithm to show you
how to translate the C code to assembly code and then how to optimize
the linear assembly code with several simple techniques.
-
Section 6.5 and section 6.6 introduce techniques for the more complex al-
gorithms associated with software pipelining, such as modulo iteration in-
terval scheduling for both single-cycle loops and multicycle loops.
-
Section 6.7 uses an IIR filter algorithm to discuss the problems with loop
carry paths.
-
Section 6.8 and section 6.9 discuss the problems encountered with if-
then-else statements in a loop and how loop unrolling can be used to re-
solve them.
-
Section 6.10 introduces live-too-long issues in your code.
-
Section 6.11 uses a simple FIR filter algorithm to discuss redundant load
elimination.
-
Section 6.12 discusses the same FIR filter in terms of the interleaved
memory bank scheme used by ’C6000 devices.
-
Section 6.13 and section 6.14 show you how to execute the outer loop of
the FIR filter conditionally and in parallel with the inner loop.