
Use Function Inlining
71
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization
Use Function Inlining
Overview
Make use of the AMD Athlon processor’s large 64-Kbyte
i n s t r u c t i o n c a ch e by i n l i n i n g s m a l l ro u t i n e s t o avo i d
procedure-call overhead. Consider the cost of possible
increased register usage, which can increase load/store
instructions for register spilling.
Function inlining has the advantage of eliminating function call
ove r h e a d a n d a l l ow i n g b e t te r re g i s t e r a l l o c a t i o n a n d
instruction scheduling at the site of the function call. The
disadvantage is decreasing code locality, which can increase
execution time due to instruction cache misses. Therefore,
function inlining is an optimization that has to be used
judiciously.
In general, due to its very large instruction cache, the
AMD Athlon processor is less susceptible than other processors
to the negative side effect of function inlining. Function call
overhead on the AMD Athlon processor can be low because
calls and returns are executed at high speed due to the use of
prediction mechanisms. However, there is still overhead due to
passing function arguments through memory, which creates
STLF (store-to-load-forwarding) dependencies. Some compilers
allow for a reduction of this overhead by allowing arguments to
be passed in registers in one of their calling conventions, which
has the drawback of constraining register allocation in the
function and at the site of the function call.
In general, function inlining works best if the compiler can
utilize feedback from a profiler to identify the function call
sites most frequently executed. If such data is not available, a
reasonable heuristic is to concentrate on function calls inside
loops. Functions that are directly recursive should not be
considered candidates for inlining. However, if they are
end-recursive, the compiler should convert them to an iterative
equivalent to avoid potential overflow of the AMD Athlon
processor return prediction mechanism (return stack) during
deep recursion. For best results, a compiler should support
function inlining across multiple source files. In addition, a
compiler should provide inline templates for commonly used
library functions, such as sin(), strcmp(), or memcpy().
Содержание Athlon Processor x86
Страница 1: ...AMD Athlon Processor x86 Code Optimization Guide TM...
Страница 12: ...xii List of Figures AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 16: ...xvi Revision History AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 60: ...44 Code Padding Using Neutral Code Fillers AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 92: ...76 Push Memory Data Carefully AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 122: ...106 Take Advantage of the FSINCOS Instruction AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 156: ...140 AMD Athlon Processor Microarchitecture AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 176: ...160 Write Combining Operations AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 202: ...186 Page Attribute Table PAT AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 252: ...236 VectorPath Instructions AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 256: ...240 Index AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...