Lesson 3: Packed Data Optimization of Memory Bandwidth
2-21
Compiler Optimization Tutorial
Example 2–12. lesson3_c.asm
;*––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––*
;* SOFTWARE PIPELINE INFORMATION
;*
;* Loop Unroll Multiple : 2x
;* Known Minimum Trip Count : 10
;* Known Maximum Trip Count : 1073741823
;* Known Max Trip Count Factor : 1
;* Loop Carried Dependency Bound(^) : 0
;* Unpartitioned Resource Bound : 2
;* Partitioned Resource Bound(*) : 2
;* Resource Partition:
;* A–side B–side
;* .L units 0 0
;* .S units 2* 1
;* .D units 2* 2*
;* .M units 2* 2*
;* .X cross paths 1 1
;* .T address paths 2* 2*
;* Long read paths 1 1
;* Long write paths 0 0
;* Logical ops (.LS) 1 1 (.L or .S unit)
;* Addition ops (.LSD) 0 1 (.L or .S or .D unit)
;* Bound(.L .S .LS) 2* 1
;* Bound(.L .S .D .LS .LSD) 2* 2*
;*
;* Searching for software pipeline schedule at ...
;* ii = 2 Schedule found with 6 iterations in parallel
;* done
;*
;* Epilog not entirely removed
;* Collapsed epilog stages : 2
;*
;* Prolog not removed
;* Collapsed prolog stages : 0
;*
;* Minimum required memory pad : 8 bytes
;*
;* Minimum safe trip count : 8
;*
Success! The compiler has fully optimized this loop. You can now achieve two
iterations of the loop every two cycles for one cycle per iteration throughout.
The .D and .T resources now show four (two LDWs and two STHs for two itera-
tions of the loop).