Chapter 6. Compilers and optimization tools for C, C++, and Fortran
113
OpenMP
The OpenMP API is an industry specification for shared-memory parallel programming. The
current GCC compilers, starting with GCC- 4.4 (Advance Toolchain 4.0+), provide a full
implementation of the OpenMP 3.0 specification in C, C++, and Fortran. Programming with
OpenMP allows you to benefit from the incremental introduction of parallelism in an existing
application by adding pragmas or directives to specify how the application can
be parallelized.
For applications with available parallelism, OpenMP can provide a simple solution for parallel
programming, without requiring low-level thread manipulation. The GNU OpenMP
implementation on the GCC compilers is available under the
-fopenmp
option. GCC also
provides auto-parallelization under the
-ftree-parallelize-loops
option.
Whole-program analysis
Traditional compiler optimizations operate independently on each application source file.
Inter-procedural optimizations operate at the whole-program scope, using the interaction
between parts of the application on different source files. It is often effective for large-scale
applications that are composed of hundreds or thousands of source files.
Starting with GCC- 4.6 (Advance Toolchain 5.0), there is the Link Time Optimization (LTO)
feature. LTO allows separate compilation of multiple source files but saves additional (abstract
program description) information in the resulting object file. Then, at application link time, the
linker can collect all the objects (with additional information) and pass them back to the
compiler (GCC) for whole program IPA and final code generation.
The GCC LTO feature is enabled on the compile and link phases by the
-flto
option. A
simple example follows:
gcc -flto -O3 -c a.c
gcc -flto -O3 -c b.c
gcc -flto -o program a.o b.o
Additional options that can be used with
-flto
include:
-flto-partition={
1to1
|
balanced
|
none
}
-flto-compression-level=
n
Detailed descriptions about
-flto
and its related options are in Options That Control
Optimization, available at:
http://gcc.gnu.org/onlinedocs/gcc-4.6.3/gcc/Optimize-Options.html#Optimize-Options
Profiled-based optimization
Profile-based optimization allows the compiler to collect information about the program
behavior and use that information when you make code generation decisions. It involves
compiling the program twice: first, to generate an
instrumented
version of the application that
collects program behavior data when run, and a second time to generate an optimized binary
using information that is collected by running the instrumented binary through a set of typical
inputs for the application.
Profile-based optimization in the GCC compiler is accessed through the
-fprofile-generate
and
-fprofile-use
options on top of
-O2
optimization levels. The instrumented binary is
generated by using
-fprofile-generate
on top of all other options, and the resulting binary
file generates the profile data in a file, named
._pdf
by default. For example:
gcc -fprofile-generate -O3 -c a.c
gcc -fprofile-generate -O3 -c b.c
Summary of Contents for Power System POWER7 Series
Page 2: ......
Page 36: ...20 POWER7 and POWER7 Optimization and Tuning Guide...
Page 70: ...54 POWER7 and POWER7 Optimization and Tuning Guide...
Page 112: ...96 POWER7 and POWER7 Optimization and Tuning Guide...
Page 140: ...124 POWER7 and POWER7 Optimization and Tuning Guide...
Page 162: ...146 POWER7 and POWER7 Optimization and Tuning Guide...
Page 170: ...154 POWER7 and POWER7 Optimization and Tuning Guide...
Page 223: ......