background image

 

DPU IP Product Guide

 

www.xilinx.com

 

10 

PG338 (v1.2) March 26, 2019 

 

Chapter 2: Product Specification 

Hardware Architecture 

The detailed hardware architecture of DPU is shown in Figure 6. After start-up, DPU fetches instructions 

from the off-chip memory and parses instructions to operate the computing engine. The instructions 

are generated by the DNNDK compiler where substantial optimizations have been performed.  
To improve the efficiency, abundant on-chip memory in Xilinx® devices is used to buffer the 

intermediate data, input, and output data. The data is reused as much as possible to reduce the 

memory bandwidth. Deep pipelined design is used for the computing engine. Like other accelerators, 

the computational arrays (PE) take full advantage of the fine-grained building blocks, which includes 

multiplier, adder, accumulator, etc. in Xilinx devices. 
 

In

st

ruc

tio

Sc

he

du

le

r

CPU (DNNDK)

Memory Controller

Bus

Fetcher

Decoder

Dispatcher

On

-C

hi

p B

uff

er

 

Con

trol

le

r

Data Mover

On-Chip BRAM

BRAM Reader/Writer

Co

m

pu

tin

En

gi

ne

Co

nv 

En

gi

ne

M

isc

 

En

gi

ne

PE

PE

PE

Processing System (PS)

Programmable Logic (PL)

Off-Chip Memory

X22332-022019

 

Figure 6: DPU Hardware Architecture 

 

 

Send Feedback

Summary of Contents for B1024

Page 1: ...DPU for Convolutional Neural Network v1 2 DPU IP Product Guide PG338 v1 2 March 26 2019...

Page 2: ...dated descrption Build the Demo Updated figure Demo Execution Updated code 03 08 2019 Version 1 1 Table 6 Reg_dpu_base_addr Updated descrption Figure 10 DPU Configuration Updated figure Build the Peta...

Page 3: ...are Architecture 10 DSP with Enhanced Utilization DPU_EU 11 Register Space 13 Interrupts 17 Chapter 3 DPU Configuration 18 Introduction 18 Configuration Options 19 DPU Performance on Different Devices...

Page 4: ...U IP Product Guide www xilinx com 4 PG338 v1 2 March 26 2019 Introduction 33 Hardware Design Flow 36 Software Design Flow 39 Appendix A Legal Notices 43 References 43 Please Read Important Legal Notic...

Page 5: ...B512 B800 B1024 B1152 B1600 B2304 B3136 and B4096 o Configurable core number up to three o Convolution and deconvolution o Max pooling o ReLu and Leaky ReLu o Concat o Elementwise o Dilation o Reorg...

Page 6: ...YOLO SSD MobileNet FPN etc The DPU IP can be integrated as a block in the programmable logic PL of the selected Zynq 7000 SoC and Zynq UltraScale MPSoC devices with direct connections to the processi...

Page 7: ...driver which is included in the Xilinx Deep Neural Network Development Kit DNNDK toolchain You can download the free developer resources from the Xilinx website https www xilinx com products design to...

Page 8: ...ntation DPU Camera AXI Interconnect Controller DDR ARM R5 DisplayPort USB3 0 SATA3 1 PCIe Gen2 GigE USB2 0 UART SPI Quad SPI NAND SD demosaic gamma Color_ conversion DMA AXI Interconnect AXI Interconn...

Page 9: ...Loader Operating System Host CPU Deep Learning App DPU accelerated Profiler Libarary DPU Driver DPU User Space Kernel Space Hardware Platform X22331 022019 Figure 5 Application Execution Hierarchy Lic...

Page 10: ...sed to buffer the intermediate data input and output data The data is reused as much as possible to reduce the memory bandwidth Deep pipelined design is used for the computing engine Like other accele...

Page 11: ...performance achieved with the device Therefore two input clocks for DPU is needed one for general logic and the other for DSP slices The difference between DPU and DPU_EU is shown in Figure 7 All DPU...

Page 12: ...tive Low reset for DSP unit m_axi_dpu_aclk Clock 1 I Input clock used for DPU general logic m_axi_dpu_aresetn Reset 1 I Active Low reset for DPU general logic DPUx_M_AXI_INSTR Memory mapped AXI master...

Page 13: ...nals are active High The details of reg_dpu_reset is shown in Table 2 Table 2 Reg_dpu_reset Register Address Offset Width Type Description Reg_dpu_reset 0x004 32 R W 0 reset of DPU core 0 1 reset of D...

Page 14: ...th Type Description Reg_dpu0_instr_addr 0x20c 32 R W 0 The instruction start address in external memory for DPU core0 Reg_dpu1_instr_addr 0x30c 32 R W 0 The instruction start address in external memor...

Page 15: ...ase address4 of DPU core0 Reg_dpu0_base_addr5_l 0x24C 32 R W The lower 32 bits of the base address5 of DPU core0 Reg_dpu0_base_addr5_h 0x250 32 R W The lower 8 bits in the register represent the upper...

Page 16: ...R W The lower 8 bits in the register represent the upper 8 bits of the base address1 of DPU core2 Reg_dpu2_base_addr2_l 0x434 32 R W The lower 32 bits of the base address2 of DPU core2 Reg_dpu2_base_a...

Page 17: ...determined by the number of DPU cores When the parameter of DPU_NUM is set to 2 it means the DPU IP is integrated with two DPU cores and the data width of the dpu_interrupt signal is two bits The lowe...

Page 18: ...ing table Table 7 Deep Neural Network Features and Parameters Supported by DPU Features Description Convolution Kernel Sizes W 1 16 H 1 16 Strides W 1 4 H 1 4 Padding_w 1 kernel_w 1 Padding_h 1 kernel...

Page 19: ...bitrary Notes 1 The parameter channel_parallel is determined by the DPU configuration For example the channel_parallel of DPU B1152 is 12 the channel_parallel of DPU B4096 is 16 Configuration Options...

Page 20: ...ifferent programmable logic resource The larger convolution architecture can achieve higher performance with more resources The parallelism for different convolution architecture is listed in Table 8...

Page 21: ...d low DSP usage is shown in Table 9 The data is tested on the Xilinx ZCU102 platform without Depthwise Conv Average Pooling Relu6 and Leaky Relu features Table 9 Resources of Different DSP Usage High...

Page 22: ...The final utilization is shown in Figure 11 Figure 11 Summary Page of DPU Configuration DPU Performance on Different Devices Table 10 shows the peak performance of the DPU on different devices Table 1...

Page 23: ...dwidth requirements for some neural networks averaged by layer have been tested with one DPU core running at full speed The peak and average I O bandwidth requirements of three different neural networ...

Page 24: ...ure 12 shows the three clock domains PL s_axi_clk DPU Register Configure Data Controller Calculation Unit m_axi_dpu_aclk dpu_2x_aclk X22334 022019 Figure 12 Clock Domain in DPU Register Clock The inpu...

Page 25: ...axi_dpu_aclk and the two clocks must be synchronous to meet the timing closure The recommended circuit design is shown in Figure 13 MMCM RST CLKIN CLKOUT BUFGCE_DIV CE CLR I O BUFGCE_DIV_CLK2_INST dpu...

Page 26: ...be set to Auto Figure 14 Recommended Clocking Options of Clock Wizard Matched Routing Select the Matched Routing for the m_axi_dpu_aclk and dpu_2x_clk in the Output Clocks tab of the Clock Wizard IP W...

Page 27: ...t You must guarantee each pair of clocks and resets is generated in a synchronous clock domain If the related clocks and resets are not matched the DPU might not work properly A recommended solution i...

Page 28: ...ository Add DPU IP into Block Design Configure DPU Parameters Connect DPU with a Processing System in the Xilinx SoC Assign Register Address for DPU Generate Bitstream Generate BOOT BIN Add DPU IP int...

Page 29: ...rch 26 2019 Figure 18 DPU IP in Repository Add DPU IP into Block Design Search DPU IP in the block design interface and add DPU IP into the block design The procedure is shown in Figure 19 and Figure...

Page 30: ...The number of master interfaces in the DPU IP depends on the DPU_NUM parameter You can connect the DPU to a processing system PS with any kind of interconnections You must ensure the DPU can correctly...

Page 31: ...stom system with the pre built Linux environment in the DNNDK package the DPU slave interface must be connected to the M_AXI_HPM0_LPD PS Master and the DPU base address must be set to 0x8F00_000 with...

Page 32: ...itstream in Vivado shown in Figure 24 Figure 24 Generate Bitstream Generate BOOT BIN You can use the Vivado SDK or PetaLinux to generate the BOOT BIN file For boot image creation using the Vivado SDK...

Page 33: ...ures Hardware Design Flow gives an overview of how to use Xilinx Vivado Design Suite to generate the reference hardware design Software Design Flow describes the design flow of project creation in the...

Page 34: ...more information about the DNNDK package refer to the DNNDK User Guide UG1327 Requirements The following summarizes the requirements of the TRD Target platforms ZCU102 evaluation board production sil...

Page 35: ...n DPU IP Product Guide www xilinx com 35 PG338 v1 2 March 26 2019 Design Files Design files are in the following directory structure Figure 26 Directory Structure Note DPU_IP is in the pl srcs dpu_ip...

Page 36: ...file The parameters of DPU IP in the reference design are configured accordingly Both the connections of the DPU interrupt and the assignment addresses for DPU in the reference design should not be m...

Page 37: ...o build the reference Vivado project with Vivado 2018 2 For information about setting up your Vivado environment refer to the Vivado Design Suite User Guide UG910 Building the hardware design consists...

Page 38: ...U IP Product Guide www xilinx com 38 PG338 v1 2 March 26 2019 Figure 29 TRD Block Design 4 In the GUI click Generate Bitstream to generate the bit file as shown in the following figure Figure 30 Gener...

Page 39: ...wing figure Figure 31 DPU Configuration Page Those parameters of DPU can be configured in case of different resource requirements For more information about DPU and its parameters refer to Chapter 3 D...

Page 40: ...ilt path to TRD_HOME pl prj zcu102 sdk Create BOOT BIN Use the following to create the BOOT BIN file cd images linux petalinux package boot fsbl zynqmp_fsbl elf u boot u boot elf pmufw pmufw elf fpga...

Page 41: ...e resnet50 directory on the SD card for example home resnet50 If the directory does not exist create a new directory 4 Insert the SD card into the ZCU102 and boot up the board After the Linux boot run...

Page 42: ...Chapter 6 Example Design DPU IP Product Guide www xilinx com 42 PG338 v1 2 March 26 2019 Figure 32 Running Results Send Feedback...

Page 43: ...cly display the Materials without prior written consent Certain products are subject to the terms and conditions of Xilinx s limited warranty please refer to Xilinx s Terms of Sale which can be viewed...

Reviews: