174
POWER7 and Optimization and Tuning Guide
A RISC-based, superscalar, out-of-order execution processor chip such as POWER7
requires more aggressive inlining and loop-unrolling to capitalize on the larger register set
and superscalar design point. Also, automatic vectorization is not enabled at this lower
(
-O2
) optimization level, and so the vector registers and ISA feature go unused.
In GCC, you must specify the
-O3
optimization level and inform the compiler that you are
running on a newer processor chip with the Vector ISA extensions. In fact, with GCC, you
need both
-O3
and
-mcpu=power7
for the compiler to generate code that capitalizes on the
new VSX feature of POWER7.
One source of optimized libraries is the IBM Advance Toolchain for PowerLinux. The Advance
Toolchain provides alternative runtime libraries for all the common POSIX C language, Math,
and pthread libraries that are highly optimized (
-O3
and
-mcpu=
) for multiple Power platforms
(including POWER7). The Advance Toolchain run time RPM provides multiple CPU tuned
library instances and automatically selects the specific library version that is optimized for the
specific POWER5, POWER6, or POWER7 machine.
If there are specific open source or third-party libraries that are dominating the execution
profile of your application, you must ask the distribution or library product owner to provide a
build using higher optimization. Alternatively, for open source library packages, you can build
your own optimized binary version of those packages.
Deeper empirical analysis
If simple recompilation with higher optimization options or even a more capable compiler does
not provide acceptable performance, then deeper analysis is required. The IBM SDK for
PowerLinux integrates the following analysis tools:
Migration Assistant analysis, non-performing codes, and data types
Application-specific hotspot profiling
Source Code Advisor (SCA) analysis for non-performing code idioms and induced
execution hazards
The Migration Assistant analyzes the source code directly and does not require a running
binary application for analysis. Profiling and the SCA do require compiled application binary
files and an application-specific benchmark or repeatable workload for analysis.
The Migration Assistant
For applications that originate on another platform, the Migration Assistant (MA) can identify
non-portable code that must be addressed for a successful port to Power Systems. The MA
uses the Eclipse infrastructure to analyze:
Data endian dependent unions and structures
Casts with potential endian issues
Non-portable data types
Non-portable inline assembler code
Non-portable or arch dependent compiler built-ins
Proprietary or architectural-specific APIs
Program usage of non-portable data types and an inline assembler can cause poor
performance on the POWER processor, which always must be investigated and addressed.
Summary of Contents for Power System POWER7 Series
Page 2: ......
Page 36: ...20 POWER7 and POWER7 Optimization and Tuning Guide...
Page 70: ...54 POWER7 and POWER7 Optimization and Tuning Guide...
Page 112: ...96 POWER7 and POWER7 Optimization and Tuning Guide...
Page 140: ...124 POWER7 and POWER7 Optimization and Tuning Guide...
Page 162: ...146 POWER7 and POWER7 Optimization and Tuning Guide...
Page 170: ...154 POWER7 and POWER7 Optimization and Tuning Guide...
Page 223: ......