Appendix B. Performance tooling and empirical performance analysis
163
CPU profiling
A CPU profiler is a performance tool that shows in which code CPU resources are being
consumed.
Tprof
is a powerful CPU profiler that encompasses a broad spectrum of
profiling functionality:
It can profile any program, library, or kernel extension that is compiled with C, C++,
Fortran, or Java compilers. It can profile machine code that is created in real time by the
JIT compiler.
It can attribute time to processes, threads, subroutines (user mode, kernel mode, shared
library, and Java methods), source statements, and even individual machine instructions.
In most cases, no recompilation of object files is required.
Usage of
tprof
typically focuses on generating subroutine-level profiles to pinpoint code
hotspots, and to examine the impact of an attempted code optimization. A common way to
invoke
tprof
is as follows:
$ tprof -E -skeuz -x sleep 10
The
-E
flag instructs
tprof
to employ the PMU as the sampling mechanism to generate the
profile. Using the PMU as the sampling mechanism provides a more accurate profile than the
default time-based sampling mechanism, as the PMU sampling mechanism can accurately
sample regions of kernel code where interrupts are disabled. The
s
,
k
,
e
, and
u
flags instruct
tprof
to generate subroutine-level profiles for shared library, kernel, kernel extension, and
user-level activity. The
z
flag instructs
tprof
to report CPU time in the number of
ticks
(that is,
samples), instead of percentages. The
-x sleep 10
argument instructs
tprof
to collect
profiling data during the running of the
sleep 10
command. This command collects profile
data over the entire system (including all running processes) over a period of 10 seconds.
Excerpts from a
tprof
report are shown in Example B-1, Example B-2 on page 164, and
Example B-3 on page 164.
Example B-1 is a breakdown of samples of the processes that are running on the system.
When multiple processes have the same name, they have only one line in this report: the
number of processes with that name is in the “Freq” column. “Total” is the total number of
samples that are accumulated by the process, and “Kernel”, “User”, and “Shared” are the
number of samples that are accumulated by the processes in kernel (including kernel
extensions), user space, and shared libraries. “Other” is a catchall for samples that do not fall
in the other categories. The most common scenario where samples wind up in “Other” is
because of CPU resources that are being consumed by machine code that is generated in
real time by the JIT compiler. The
-j
flag of
tprof
can be used to attribute these samples to
Java methods.
Example B-1 Excerpt from a tprof report - breakdown of samples of processes running on the system
Process Freq Total Kernel User Shared Other
======= ==== ===== ====== ==== ====== =====
wait 4 5810 5810 0 0 0
./version1 1 1672 35 1637 0 0
/usr/bin/tprof 2 15 13 0 2 0
/etc/syncd 1 2 2 0 0 0
/usr/bin/sh 2 2 2 0 0 0
swapper 1 1 1 0 0 0
/usr/bin/trcstop 1 1 1 0 0 0
rmcd 1 1 1 0 0 0
======= === ===== ====== ==== ====== =====
Total 13 7504 5865 1637 2 0
Summary of Contents for Power System POWER7 Series
Page 2: ......
Page 36: ...20 POWER7 and POWER7 Optimization and Tuning Guide...
Page 70: ...54 POWER7 and POWER7 Optimization and Tuning Guide...
Page 112: ...96 POWER7 and POWER7 Optimization and Tuning Guide...
Page 140: ...124 POWER7 and POWER7 Optimization and Tuning Guide...
Page 162: ...146 POWER7 and POWER7 Optimization and Tuning Guide...
Page 170: ...154 POWER7 and POWER7 Optimization and Tuning Guide...
Page 223: ......