Appendix B. Performance tooling and empirical performance analysis
171
For more information, see emstat Command, available at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.cmd
s/doc/aixcmds2/emstat.htm
hpmstat, hpmcount, and tprof -E
The POWER7 processor provides a powerful on-chip PMU that can be used to count the
number of occurrences of performance-critical processor events. A rich set of events is
countable; examples include level 2 and level 3 d-cache misses, and cache reloads from
local, remote, and distant memory.
Local memory
is memory that is attached to the same
POWER7 processor chip that the software thread is running on.
Remote memory
is memory
that is attached to a different POWER7 processor that is in the same CEC (that is, the same
node or building block in the case of a multi-CEC system, such as a Power 780) that the
software thread is running on.
Distant memory
is memory that is attached to a POWER7
processor that is in a different CEC from the CEC the software thread is running on.
Two commands exist to count PMU events:
hpmcount
and
hpmstat
. The
hpmcount
command is
a command-line utility that runs a command and collects statistics from the PMU while the
command runs. The
hpmstat
command is similar to
hpmcount
, except that it collects
performance data on a system-wide basis, rather than just for the execution of a command.
Further documentation about
hpmcount
and
hpmstat
can be found at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.cm
ds/doc/aixcmds2/hpmcount.htm
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.cm
ds/doc/aixcmds2/hpmstat.htm
In addition to simply counting processor events, the PMU can be configured to sample
instructions based on processor events. With this capability, profiles can be generated that
show which parts of an application are experiencing specified processor events. For example,
you can show which subroutines of an application are generating level 2 or level 3 cache
misses. The
tprof
profiler includes this functionality through the
-E
flag, which allows a PMU
event name to be provided to
tprof
as the sampled event. The list of PMU events can be
generated by running
pmlist -c -1
. Whenever possible, perform profiling using
marked
events, as profiling using marked events is more accurate than using unmarked events. The
marked events begin with the prefix
PM_MRK_
.
For more information about using the
-E
flag of
tprof
, go to:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.cmd
s/doc/aixcmds5/tprof.htm
Linux
The section introduces tools and techniques used for optimizing software on the combination
of Power Systems and Linux. The intended audience for this section is software
development teams.
Содержание Power System POWER7 Series
Страница 2: ......
Страница 36: ...20 POWER7 and POWER7 Optimization and Tuning Guide...
Страница 70: ...54 POWER7 and POWER7 Optimization and Tuning Guide...
Страница 112: ...96 POWER7 and POWER7 Optimization and Tuning Guide...
Страница 140: ...124 POWER7 and POWER7 Optimization and Tuning Guide...
Страница 162: ...146 POWER7 and POWER7 Optimization and Tuning Guide...
Страница 170: ...154 POWER7 and POWER7 Optimization and Tuning Guide...
Страница 222: ...POWER7 and POWER7 Optimization and Tuning Guide POWER7 and POWER7 Optimization and Tuning Guide...
Страница 223: ......