DPU IP Product Guide
10
PG338 (v1.2) March 26, 2019
Chapter 2: Product Specification
Hardware Architecture
The detailed hardware architecture of DPU is shown in Figure 6. After start-up, DPU fetches instructions
from the off-chip memory and parses instructions to operate the computing engine. The instructions
are generated by the DNNDK compiler where substantial optimizations have been performed.
To improve the efficiency, abundant on-chip memory in Xilinx® devices is used to buffer the
intermediate data, input, and output data. The data is reused as much as possible to reduce the
memory bandwidth. Deep pipelined design is used for the computing engine. Like other accelerators,
the computational arrays (PE) take full advantage of the fine-grained building blocks, which includes
multiplier, adder, accumulator, etc. in Xilinx devices.
In
st
ruc
tio
n
Sc
he
du
le
r
CPU (DNNDK)
Memory Controller
Bus
Fetcher
Decoder
Dispatcher
On
-C
hi
p B
uff
er
Con
trol
le
r
Data Mover
On-Chip BRAM
BRAM Reader/Writer
Co
m
pu
tin
g
En
gi
ne
Co
nv
En
gi
ne
M
isc
En
gi
ne
PE
PE
PE
Processing System (PS)
Programmable Logic (PL)
Off-Chip Memory
X22332-022019
Figure 6: DPU Hardware Architecture