Linear Assembly Considerations
8-50
units from one data path to access a 32–bit operand from the opposite side’s
register file. The 1X cross path allows data path A’s functional units to read their
source from register file B. Similarly, the 2X cross path allows data path B’s
functional units to read their source from register file A. Figure 8–25 illustrates
how these register file cross paths work.
Figure 8–25. C64x Data Cross Paths
S1
S2
D
S2
D
L1
S1
S1
D
M1
S2
D S1
S2
D1
DA1
S1
Register A0–A31
2X
1X
(address)
(address)
DA2
Register B0–B31
1X
S2
S1 D
D2
S2 S1
D
M2
S2
L2
S2 S1
D
D
S2
S1
2X
C64x data cross paths
On the ’C64x, all eight of the functional units have access to the opposite side’s
register file via a cross path. Only two cross paths, 1X and 2X, exist in the
C6000 architecture. Therefore, the limit is one source read from each data
path’s opposite register file per clock cycle, or a total of two cross–path source
reads per clock cycle. The ’C64x pipelines data cross path accesses allowing
multiple functional units per side to read the same cross–path source simulta-
neously. Thus the cross path operand for one side may be used by up to two
of the functional units on that side in an execute packet. In the ’C62x/’C67x,
only one functional unit per data path, per execute packet can get an operand
from the opposite register file.
On the ’C64x, a delay clock cycle is introduced whenever an instruction at-
tempts to read a source register via a cross path where that register was up-
dated in the previous cycle. This is known as a cross path stall. This stall is in-
serted automatically by the hardware; no NOP instruction is needed. For more
information, see the
TMS320C6000 CPU and Instruction Set Reference
Guide (SPRU189). This cross path stall does not occur on the ’C62x/’C67x.
This cross path stall is necessary so that the ’C64x can achieve clock rate
goals beyond 1GHz. It should be noted that all code written for the ’C62x/’C67x
that contains cross paths where the source register was updated in the pre-
vious cycle will contain one clock stall when running on the ’C64x. The code
will still run correctly, but it will take an additional clock cycle.