Chapter 3: DPU Configuration
DPU IP Product Guide
20
PG338 (v1.2) March 26, 2019
DPU Core Number
You can use up to three DPU cores can be included in one IP. Multiple DPU cores can be used to achieve
higher performance. Consequently, it consumes more programmable logic resource.
If the requirement is to integrate more than three cores, send the request to a Xilinx® sales
representative.
DPU Convolution Architecture
The DPU IP can be configured with different convolution architectures which is related to the
parallelism of the convolution unit. The optional architecture for DPU IP includes B512, B800, B1024,
B1152, B1600, B2304, B3136, and B4096.
There are three dimensions of parallelism in the DPU convolution architecture - pixel parallelism, input
channel parallelism, and output channel parallelism. The input channel parallelism is always equal to the
output channel parallelism. The different convolution architecture requires different programmable
logic resource. The larger convolution architecture can achieve higher performance with more
resources. The parallelism for different convolution architecture is listed in Table 8.
Table 8: Parallelism for Different Convolution Architecture
Convolution
Architecture
Pixel
Parallelism
(PP)
Input Channel
Parallelism (ICP)
Output Channel
Parallelism
(OCP)
Peak Ops
(operations/per
clk)
B512
4
8
8
512
B800
4
10
10
800
B1024
8
8
8
1024
B1152
4
12
12
1150
B1600
8
10
10
1600
B2304
8
12
12
2304
B3136
8
14
14
3136
B4096
8
16
16
4096
Notes:
1.
In each clock cycle, the convolution array finishes a multiplication and an accumulation, which
are two operations. So, the peak operations per cycle is equal to PP*ICP*OCP*2.
DSP Cascade
You can select the maximal length of DSP48E slice cascade chain. Typically, the larger cascade length
indicates less logic resources, but it might lead to worse timing. The smaller cascade length might
use more fabric resources which is not economical for small devices. Xilinx recommends selecting the
mid-value, which is 4, in the first iteration and adjust the value if the timing is not met.