16
IBM Power Systems S822LC for High Performance Computing
Applications can have customized functions in FPGAs and enqueue work requests directly in
shared memory queues to the FPGA, and by using the same effective addresses (pointers) it
uses for any of its threads running on a host processor. From a practical perspective, CAPI
allows a specialized hardware accelerator to be seen as an additional processor in the
system, with access to the main system memory, and coherent communication with other
processors in the system.
The benefits of using CAPI include the ability to access shared memory blocks directly from
the accelerator, perform memory transfers directly between the accelerator and processor
cache, and reduce the code path length between the adapter and the processors. This is
possibly because the adapter is not operating as a traditional I/O device, and there is no
device driver layer to perform processing. It also presents a simpler programming model.
Figure 1-10 shows a high-level view of how an accelerator communicates with the POWER8
processor through CAPI. The POWER8 processor provides a Coherent Attached Processor
Proxy (CAPP), which is responsible for extending the coherence in the processor
communications to an external device. The coherency protocol is tunneled over standard
PCIe Gen3, effectively making the accelerator part of the coherency domain.
Figure 1-10 CAPI accelerator that is attached to the POWER8 processor
The accelerator adapter implements the Power Service Layer (PSL), which provides address
translation and system memory cache for the accelerator functions. The custom processors
on the system board, consisting of an FPGA or an ASIC, use this layer to access shared
memory regions, and cache areas as though they were a processor in the system. This ability
enhances the performance of the data access for the device and simplifies the programming
effort to use the device. Instead of treating the hardware accelerator as an I/O device, it is
treated as a processor, which eliminates the requirement of a device driver to perform
communication, and the need for Direct Memory Access that requires system calls to the
operating system (OS) kernel. By removing these layers, the data transfer operation requires
much fewer clock cycles in the processor, improving the I/O performance.
The implementation of CAPI on the POWER8 processor allows hardware companies to
develop solutions for specific application demands and use the performance of the POWER8
processor for general applications and the custom acceleration of specific functions by using
a hardware accelerator, with a simplified programming model and efficient communication
with the processor and memory resources.
Custom
Hardware
Application
CAPP
Coherence Bus
PSL
FPGA or ASIC
POWER8
PCIe Gen3
Transport for encapsulated messages
Summary of Contents for S822LC
Page 2: ......
Page 10: ...THIS PAGE INTENTIONALLY LEFT BLANK...
Page 14: ...xii IBM Power Systems S822LC for High Performance Computing...
Page 74: ...60 IBM Power Systems S822LC for High Performance Computing...
Page 78: ...64 IBM Power Systems S822LC for High Performance Computing...
Page 79: ......
Page 80: ...ibm com redbooks Printed in U S A Back cover ISBN REDP 5405 00...