
68
Unrolling Loops
AMD Athlon™ Processor x86 Code Optimization
22007E/0—November 1999
unrolling reduces register pressure by removing the loop
counter. To completely unroll a loop, remove the loop control
and replicate the loop body N times. In addition, completely
unrolling a loop increases scheduling opportunities.
Only unrolling very large code loops can result in the inefficient
use of the L1 instruction cache. Loops can be unrolled
completely, if all of the following conditions are true:
■
The loop is in a frequently executed piece of code.
■
The loop count is known at compile time.
■
The loop body, once unrolled, is less than 100 instructions,
which is approximately 400 bytes of code.
Partial Loop Unrolling
Partial loop unrolling can increase register pressure, which can
make it inefficient due to the small number of registers in the
x86 architecture. However, in certain situations, partial
unrolling can be efficient due to the performance gains
possible. Partial loop unrolling should be considered if the
following conditions are met:
■
Spare registers are available
■
Loop body is small, so that loop overhead is significant
■
Number of loop iterations is likely > 10
Consider the following piece of C code:
double a[MAX_LENGTH], b[MAX_LENGTH];
for (i=0; i< MAX_LENGTH; i++) {
a[i] = a[i] + b[i];
}
Without loop unrolling, the code looks like the following:
Содержание Athlon Processor x86
Страница 1: ...AMD Athlon Processor x86 Code Optimization Guide TM...
Страница 12: ...xii List of Figures AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 16: ...xvi Revision History AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 60: ...44 Code Padding Using Neutral Code Fillers AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 92: ...76 Push Memory Data Carefully AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 122: ...106 Take Advantage of the FSINCOS Instruction AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 156: ...140 AMD Athlon Processor Microarchitecture AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 176: ...160 Write Combining Operations AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 202: ...186 Page Attribute Table PAT AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 252: ...236 VectorPath Instructions AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 256: ...240 Index AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...