Intel® Xeon Phi™ Coprocessor D
EVELOPER
’
S
Q
UICK
S
TART
G
UIDE
25
Parallel Programming on Intel® Xeon Phi™ Coprocessor: Intel® Threading Building Blocks
(Intel® TBB)
Like Intel Cilk Plus, the Intel TBB header files are not available on the target environment by default. They are
made available to the Intel® MIC Architecture target environment using similar techniques:
#pragma offload_attribute (push,target(mic))
#include "tbb/task_scheduler_init.h"
#include "tbb/blocked_range.h"
#include "tbb/parallel_reduce.h"
#include "tbb/task.h"
#pragma offload_attribute (pop)
using namespace tbb;
Code Example 10: Wrapping the Intel TBB Header Files in C/C++
Functions called from within the offloaded construct and global data required on the Intel® Xeon Phi™
Coprocessor should be appended by the special function attribute
__attribute__((target(mic)))
.
As an example,
parallel_reduce
recursively splits an array into subranges for each thread to work on. The
parallel_reduce
uses a splitting constructor to make one or more copies for each thread. For each split, the
method join is invoked to accumulate the results.
1.
Prefix the class by the macro
__MIC__
and the class name by
__attribute__((target(mic)))
if
you want them to be generated for the coprocessor.
#ifdef __MIC__
class
__attribute__((target(mic)))
ReduceTBB
{
private:
float *my_data;
public:
float sum;
void operator()( const blocked_range<size_t>& r )
{
float *data = my_data;
for( size_t i=r.begin(); i!=r.end(); ++i)
{
sum += data[i];
}
}
ReduceTBB( ReduceTBB& x, split) : my_data(x.my_data), sum(0) {}
void join( const ReduceTBB& y) { sum += y.sum; }
ReduceTBB( float data[] ) : my_data(data), sum(0) {}
};
#endif