Intel® Xeon Phi™ Coprocessor D
EVELOPER
’
S
Q
UICK
S
TART
G
UIDE
17
switch, which provides an alternative to editing your source files in some situations (applies to the pragma-
based offload methods). Finally,
-no-offload
provides a way to make the compiler ignore the
_Cilk_offload
and
#pragma_offload
constructs (which cause it by default to build a heterogeneous
binary).
Debugging During Runtime
To debug offload activity, the following environment variables are available:
To learn whether offload portions of the program are running on the host or coprocessor
For csh
–
setenv H_TRACE 1
For sh –
export H_TRACE=1
For more complete debug information
For csh –
setenv H_TRACE 2
For sh –
export H_TRACE=2
To print the compiler’s internal offload timers, a value of 1 reports just the time the offload took
measured by the host, and the amount of computation time done by the coprocessor. A value of
2 adds information on how much data was transferred in either direction.
For csh –
setenv OFFLOAD_REPORT <1 or 2>
For sh –
export OFFLOAD_REPORT=<1 or 2>
Details can be found in the compiler documentation in the “Compilation/Setting Environment Variables” section.
Where to Get More Help
You can visit the Forum on the Intel® Xeon Phi™ Coprocessor to post questions. It can be found at the
http://software.intel.com/en-us/forums/intel-many-integrated-core
.
Using the Offload Compiler – Explicit Memory Copy Model
In this section, a reduction is used as an example to show a step-by-step approach for developing applications
for the Intel® Xeon Phi™ Coprocessor using the offload compiler. The offload compiler is a
heterogeneous
2
compiler, with both host CPU and target compilation environments. Code for both the host CPU and Intel® Xeon
Phi™ coprocessor is compiled within the host environment, and offloaded code is automatically run within the
target environment. The offload behavior is controlled by compiler directives: pragmas in C/C++, and
directives in Fortran.
Some common libraries, such as the Intel® Math Kernel Library (Intel® MKL), are available in host versions as
well as target versions. When an application executes its first offload and the target is available, the runtime
loads the target executable onto the Intel® Xeon Phi™ Coprocessor. At this time, it also initializes the libraries
linked with the target code. The loaded target executable remains in the target memory until the host
program terminates. Thus, any global state maintained by the library is maintained across offload instances.
2
http://dictionary.reference.com/browse/heterogeneous