Intel® Xeon Phi™ Coprocessor D
EVELOPER
’
S
Q
UICK
S
TART
G
UIDE
18
Note: Although, the user may specify the region of code to run on the target, there is no guarantee of
execution on the Intel® Xeon Phi™ Coprocessor. Depending on the presence of the target hardware or the
availability of resources on the Intel® Xeon Phi™ Coprocessor when execution reaches the region of code
marked for offload, the code can run on the Intel® Xeon Phi™ Coprocessor or not.
The following code samples show several versions of porting reduction code to the Intel® Xeon Phi™
Coprocessor using the offload pragma directive.
Reduction
The operation refers to computing the expression:
ans = a[0] + a[1] + … + a[n-1]
Host Version:
The following sample code shows the C code to implement this version of the reduction.
float reduction(float *data, int size)
{
float ret = 0.f;
for (int i=0; i<size; ++i)
{
ret += data[i];
}
return ret;
}
Code Example 1: Implementing Reduction Code in C/C++
Creating the Offload Version
Serial Reduction with Offload
The programmer uses #pragma offload target(mic) (as shown in the example below) to mark statements
(offload constructs) that should execute on the Intel® Xeon Phi™ Coprocessor. The offloaded region is defined
as the offload construct plus the additional regions of code that run on the target as the result of function
calls. Execution of the statements on the host will resume once the statements on the target have executed
and the results are available on the host (i.e. the offload will block, although there is a version of this pragma
that allows asynchronous execution). The in, out, and inout clauses specify the direction of data to be
transferred between the host and the target.
Variables used within an offloaded construct that are declared outside the scope of the construct (including
the file-scope) are copied (by default) to the target before execution on the target begins and copied back to
the host on completion.
For example, in the code below, the variable ret is automatically copied to the target before execution on the
target and copied back to the host on completion. The offloaded code below is executed by a single thread on
a single Intel® MIC Architecture core.