120
POWER7 and Optimization and Tuning Guide
Function inlining
FDPR performs function inlining of function bodies into their respective calling sites if the call
site is selected by one of a number of user-selected filters:
Dominant callers (
--selective-inlining
(
-si
),
-sidf f
, and
-siht f
): The filter criteria
here is that the site is dominant regarding other callers of the called function (the callee). It
is controlled by two attributes. The
-sidf
option sets the domination percentage threshold
(default 80). The
-siht
option further restricts the selection to functions hotter than the
threshold, which is specified in percents relative to the average (default 100).
Hot functions (
--inline-hot-functions f
(
-ihf f
)): This filter selects inlining for all call
sites where the call is hotter than the heat threshold (in percent, relative to the average).
Small functions (
--inline-small-functions f
(
-isf f
)): This filter selects for inlining all
functions whose size, in bytes, is smaller than or equal to the parameter.
Selective hot code (
--selective-hot-code-inline f
(
-shci f
)): The filter computes how
much execution count is saved if the function is inlined at a call site and selects those sites
where the relative saving is above the percentage.
De-virtualization
De-virtualization is addressed by the
--ptrgl-optimization
(
-pto
) option. It is full-blown call
by a pointer mechanism (
ptrgl
) sets a new TOC anchor, loads the function address, moves it
to the counter register (CTR), and jumps indirectly through the CTR. The
-pto
optimizes this
mechanism in cases where there is few hot targets from a calling site. In terms of C++, it
de-virtualizes the virtual method calls by calling the actual targets directly. The optimized
code compares the address of the function descriptor, which is used for the indirect call,
against the address of a hot candidate, as identified in the profile, and conditionally calls such
target directly. If none of the hot targets match, the code invokes the original indirect call
mechanism. The idea is that most of the time the conditional direct branches are run instead
of the
ptrgl
mechanism. The impact of the optimization on performance depends heavily on
the function call profile.
The following thresholds can help to tune the optimization and to adjust it to
different workloads:
Use
-ptoht
thres
to set the frequency threshold for indirect calls that are to be optimized
(
thres
can be 0 - 1, with 0.8 by default).
Use
-ptosl
n
to set the limit of the number of hot functions to optimize in a given indirect
call site (the default for
n
is 3).
Loop-unrolling
Most programs spend their time in loops. This statement is true regardless of the target
architecture or application. FDPR has one option to control the unrolling optimization for
loops:
--loop-unrolling factor
(
-lu factor
).
FDPR optimizes loop using a technique called
loop-unrolling
. By unrolling a loop n times, the
number of back branches is reduced n times, so code prefetch efficiency can be improved.
The downside with loop-unrolling is code inflation, which results in increased code footprint
and increased i-cache misses. Unlike traditional loop-unrolling, FDPR is able to mitigate this
problem by unrolling only the hottest paths in the loop. The
factor
parameter determines the
aggressiveness of the optimization. With
-O3
, the optimization is invoked with
-lu 9
.
By default, loops are unrolled two times. Use
-lu factor
to change that default.
Содержание Power System POWER7 Series
Страница 2: ......
Страница 36: ...20 POWER7 and POWER7 Optimization and Tuning Guide...
Страница 70: ...54 POWER7 and POWER7 Optimization and Tuning Guide...
Страница 112: ...96 POWER7 and POWER7 Optimization and Tuning Guide...
Страница 140: ...124 POWER7 and POWER7 Optimization and Tuning Guide...
Страница 162: ...146 POWER7 and POWER7 Optimization and Tuning Guide...
Страница 170: ...154 POWER7 and POWER7 Optimization and Tuning Guide...
Страница 222: ...POWER7 and POWER7 Optimization and Tuning Guide POWER7 and POWER7 Optimization and Tuning Guide...
Страница 223: ......