Lesson 2: Balancing Resources With Dual-Data Paths
2-16
Example 2–9. lesson2_c.asm
;*––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––*
;* SOFTWARE PIPELINE INFORMATION
;*
;* Loop Unroll Multiple : 2x
;* Known Minimum Trip Count : 10
;* Known Maximum Trip Count : 1073741823
;* Known Max Trip Count Factor : 1
;* Loop Carried Dependency Bound(^) : 0
;* Unpartitioned Resource Bound : 3
;* Partitioned Resource Bound(*) : 3
;* Resource Partition:
;* A–side B–side
;* .L units 0 0
;* .S units 2 1
;* .D units 3* 3*
;* .M units 2 2
;* .X cross paths 1 1
;* .T address paths 3* 3*
;* Long read paths 1 1
;* Long write paths 0 0
;* Logical ops (.LS) 1 1 (.L or .S unit)
;* Addition ops (.LSD) 0 1 (.L or .S or .D unit)
;* Bound(.L .S .LS) 2 1
;* Bound(.L .S .D .LS .LSD) 2 2
;*
;* Searching for software pipeline schedule at ...
;* ii = 3 Schedule found with 5 iterations in parallel
;* done
;*
;* Epilog not entirely removed
;* Collapsed epilog stages : 2
;*
;* Prolog not entirely removed
;* Collapsed prolog stages : 3
;*
;* Minimum required memory pad : 8 bytes
;*
;* Minimum safe trip count : 4
Notice the following things in the feedback:
A schedule with three cycles (ii=3): You can tell by looking at the .D units and
.T address paths that this 3–cycle loop comes after the loop has been unrolled
because the resources show a total of six memory accesses evenly balanced
between the A side and B side. Therefore, our new effective loop iteration inter-
val is 3/2 or 1.5 cycles.
A Known Minimum Trip Count of 10: This is because we specified the count
of the original loop to be greater than or equal to twenty and a multiple of two
and after unrolling, this is cut in half. Also, a new line, Known Maximum Trip