Multicore Fixed and Floating-Point System-on-Chip
Copyright 2012 Texas Instruments Incorporated
Device Overview
19
SPRS689D—March 2012
TMS320C6670
C66x CPU improves the performance over the C674x double-precision multiplies by adding a instruction allowing
one double-precision multiply per cycle and also reduces the number of delay slots from ten to four. Each C66x .M
unit can also perform one the following floating-point operations each clock cycle: one, two, or four single-precision
multiplies or a complex single-precision multiply.
The .L and .S units can now support up to 64-bit operands. This allows for new versions of many of the arithmetic,
logical, and data packing instructions to allow for more parallel operations per cycle. Additional instructions were
added yielding performance enhancements of the floating point addition and subtraction instructions, including the
ability to perform one double-precision addition or subtraction per cycle. Conversion to/from integer and
single-precision values can now be done on both .L and .S units on the C66x. Also, by taking advantage of the larger
operands, instructions were also added to double the number of these conversions that can be done. The .L unit also
has additional instructions for logical AND and OR instructions, as well as 90 degree or 270 degree rotation of
complex numbers (up to two per cycle). Instructions have also been added that allow for computing the conjugate
of a complex number.
The MFENCE instruction is a new instruction introduced with the C66x DSP. This instruction creates a CPU stall
until the completion of all the CPU-triggered memory transactions, including:
•
Cache line fills
•
Writes from L1D to L2 or from the CorePac to MSMC and/or other system endpoints
•
Victim write backs
•
Block or global coherence operations
•
Cache mode changes
•
Outstanding XMC prefetch requests
This is useful as a simple mechanism for programs to wait for these requests to reach their endpoint. It also provides
ordering guarantees for writes arriving at a single endpoint via multiple paths, multiprocessor algorithms that
depend on ordering, and manual coherence operations.
For more details on the C66x CPU and its enhancements over the C64x+ and C674x architectures, see the following
documents (
2.9 ‘‘Related Documentation from Texas Instruments’’ on page 66
•
C66x CPU and Instruction Set Reference Guide
•
C66x DSP Cache User Guide
•
C66x CorePac User Guide
Содержание TMS320C6670
Страница 225: ......