Modulo Scheduling of Multicycle Loops
6-62
6.6.5
Linear Assembly Resource Allocation
Using the dependency graph, you can allocate functional units and registers
as shown in Example 6–38. This code is based on the following assumptions:
-
The pointers are initialized outside the loop.
-
m resides in B6, which causes both .M units to use a cross path.
-
The mask in the AND instruction resides in B10.
Example 6–38. Linear Assembly for Weighted Vector Sum With Resources Allocated
LDW
.D2
*A4++,A2
; ai & ai+1
LDW
.D1
*B4++,B2
; bi & bi+1
MPY
.M1
A2,B6,A5
; pi = m * ai
MPYHL
.M2
A2,B6,B5
; pi+1 = m * ai+1
SHR
.S1
A5,15,A7
; pi_scaled = (m * ai) >> 15
SHR
.S2
B5,15,B7
; pi+1_scaled = (m * ai+1) >> 15
AND
.L2X
B2,B10,B8
; bi
SHR
.S2
B2,16,B1
; bi+1
ADD
.L1X
A7,B8,A9
; ci = (m * ai) >> 15 + bi
ADD
.L2
B7,B1,B9
; ci+1 = (m * ai+1) >> 15 + bi+1
STH
.D1
A9,*A6++[2]
; store ci
STH
.D2
B9,*B0++[2]
; store ci+1
[A1] SUB
.L1
A1,1,A1
; decrement loop counter
[A1] B
.S1
LOOP
; branch to loop
6.6.6
Modulo Iteration Interval Scheduling
Table 6–12 provides a method to keep track of resources that are a modulo
iteration interval away from each other. In the single-cycle dot product exam-
ple, every instruction executed every cycle and, therefore, required only one
set of resources. Table 6–12 includes two groups of resources, which are
necessary because you are scheduling a two-cycle loop.
-
Instructions that execute on cycle k also execute on cycle k
+ 2, k + 4, etc.
Instructions scheduled on these even cycles cannot use the same
resources.
-
Instructions that execute on cycle k + 1 also execute on cycle k
+ 3, k + 5,
etc. Instructions scheduled on these odd cycles cannot use the same
resources.
-
Because two instructions (MPY and ADD) use the 1X path but do not use
the same functional unit, Table 6–12 includes two rows (1X and 2X) that
help you keep track of the cross path resources.