Chapter 2. Architecture and technical overview
45
Draft Document for Review October 14, 2014 10:19 am
5137ch02.fm
processor. From the practical perspective, CAPI allows a specialized hardware accelerator to
be seen as an additional processor in the system, with access to the main system memory,
and coherent communication with other processors in the system.
The benefits of using CAPI include the ability to access shared memory blocks directly from
the accelerator, perform memory transfers directly between the accelerator and processor
cache, and reduction in the code path length between the adapter and the processors. This is
because the adapter is not operating as a traditional I/O device, and there is no device driver
layer to perform processing. It also presents a simpler programming model.
Figure 2-11 shows a high-level view of how an accelerator communicates with the POWER8
processor through CAPI. The POWER8 processor provides a Coherent Attached Processor
Proxy (CAPP), which is responsible for extending the coherence in the processor
communications to an external device. The coherency protocol is tunneled over standard
PCIe Gen3, effectively making the accelerator part of the coherency domain.
The accelerator adapter implements the Power Service Layer (PSL), which provides address
translation and system memory cache for the accelerator functions. The custom processors
on the board, consisting of an FPGA or an Application Specific Integrated Circuit (ASIC) use
this layer to access shared memory regions, and cache areas as through they were a
processor in the system. This ability greatly enhances the performance of the data access for
the device and simplifies the programming effort to use the device. Instead of treating the
hardware accelerator as an I/O device, it is treated as a processor. This eliminates the
requirement of a device driver to perform communication, and the need for Direct Memory
Access that requires system calls to the operating system kernel. By removing these layers,
the data transfer operation requires fewer clock cycles in the processor, greatly improving the
I/O performance.
Figure 2-11 CAPI accelerator that is attached to the POWER8 processor
The implementation of CAPI on the POWER8 processor allows hardware companies to
develop solutions for specific application demands and use the performance of the POWER8
processor for general applications and custom acceleration of specific functions using a
hardware accelerator, with a simplified programming model and efficient communication with
the processor and memory resources.
Custom
Hardware
Application
CAPP
Coherence Bus
PSL
FPGA or ASIC
POWER8
PCIe Gen3
Transport for encapsulated messages