Intel® Xeon Phi™ Coprocessor D
EVELOPER
’
S
Q
UICK
S
TART
G
UIDE
22
scp /opt/intel/composerxe/lib/mic/libiomp5.so mic0:/tmp/libiomp5.so
5.
Connect to the coprocessor with
ssh
and export the local directory so that the application can find any
shared libraries it uses (in this case the OpenMP* runtime library):
ssh mic0
export LD_LIBRARY_PATH=/tmp
6.
This application may generate a segmentation fault if the stacksize is not set correctly. To modify the
stacksize use:
ulimit –s unlimited
7.
Go to
/tmp
and run
a.out:
cd /tmp
./a.out
Parallel Programming Options on the Intel® Xeon Phi™ Coprocessor
Most of the parallel programming options available on the host systems are available for the Intel® Xeon Phi™
Coprocessor. These include the following:
1.
Intel Threading Building Blocks (Intel® TBB)
2.
OpenMP*
3.
Intel® Cilk Plus
4.
pthreads*
The following sections will discuss the use of these parallel programming models in code using the offload
extensions. Code that runs natively on the Intel® Xeon Phi™ Coprocessor can use these parallel programming
models just as they would on the host, with no unusual complications beyond the larger number of threads.
Parallel Programming on the Intel® Xeon Phi™ Coprocessor: OpenMP*
There is no correspondence between OpenMP threads on the host CPU and on the Intel® Xeon Phi™
Coprocessor. Because an OpenMP parallel region within an offload/pragma is offloaded as a unit, the offload
compiler creates a team of threads based on the available resources on Intel® Xeon Phi™ Coprocessor. Since
the entire OpenMP construct is executed on the Intel® Xeon Phi™ coprocessor, within the construct the usual
OpenMP* semantics of shared and private data apply.
Multiple host CPU threads can offload to the Intel® Xeon Phi™ coprocessor at any time. If a CPU thread
attempts to offload to the Intel® Xeon Phi™ Coprocessor and resources are not available on the coprocessor,
the code meant to be offloaded may be executed on the host. When a thread on the coprocessor reaches the
“omp parallel” directive, it creates a team of threads based on the resources available on the coprocessor. The
theoretical maximum number of hardware threads that can be created is 4 times the number of cores in your
Intel® Xeon Phi™ Coprocessor. The practical limit is four less than this (for offloaded code) because the first
core is reserved for the uOS and its services.