background image

 

DPU IP Product Guide

 

www.xilinx.com

 

33 

PG338 (v1.2) March 26, 2019 

 

Chapter 6: Example Design  

 

Introduction 

The Xilinx® DPU targeted reference design (TRD) provides instructions on how to use DPU with a Xilinx 

SoC platform to build and run deep neural network applications. The TRD uses the Vivado® IP 

integrator flow for building the hardware design and Xilinx Yocto PetaLinux flow for software design. 

The Zynq® Ult™ MPSoC platform is used to create this TRD. It can also be used for a Zynq-

7000 SoC platform with the same flow.  
This appendix describes the architecture of the reference design and provides a functional description 

of its components. It is organized as follows: 

 

DPU TRD Overview

 provides a high-level overview of the Zynq Ult MPSoC device 

architecture, the reference design architecture, and a summary of key features. 

 

Hardware Design Flow

 gives

 

an overview of how to use Xilinx Vivado Design Suite to generate 

the reference hardware design. 

 

Software Design Flow

 

describes

 

the design flow of project creation in the PetaLinux 

environment. 

 

Demo Execution

 describes how to run the application created by the TRD. 

DPU TRD Overview 

The TRD creates an image classification application running a popular deep neural network model, 

Resnet50, on a Xilinx Ult MPSoC device. The overall functionality of the TRD is partitioned 

between the Processing System (PS) and Programmable Logic (PL), where DPU resides for optimal 

performance.  
The following figure shows the TRD block diagram. The host communicates with the ZCU102 board 

through Ethernet or UART port. The input images for a TRD are stored in an SD card. When the TRD is 

running, the input data is loaded into DDR memory, then DPU reads the data from the DDR memory 

and writes the results back to DDR memory. The result displays on the host screen from the APU 

through Ethernet or UART port. 

Send Feedback

Summary of Contents for B1024

Page 1: ...DPU for Convolutional Neural Network v1 2 DPU IP Product Guide PG338 v1 2 March 26 2019...

Page 2: ...dated descrption Build the Demo Updated figure Demo Execution Updated code 03 08 2019 Version 1 1 Table 6 Reg_dpu_base_addr Updated descrption Figure 10 DPU Configuration Updated figure Build the Peta...

Page 3: ...are Architecture 10 DSP with Enhanced Utilization DPU_EU 11 Register Space 13 Interrupts 17 Chapter 3 DPU Configuration 18 Introduction 18 Configuration Options 19 DPU Performance on Different Devices...

Page 4: ...U IP Product Guide www xilinx com 4 PG338 v1 2 March 26 2019 Introduction 33 Hardware Design Flow 36 Software Design Flow 39 Appendix A Legal Notices 43 References 43 Please Read Important Legal Notic...

Page 5: ...B512 B800 B1024 B1152 B1600 B2304 B3136 and B4096 o Configurable core number up to three o Convolution and deconvolution o Max pooling o ReLu and Leaky ReLu o Concat o Elementwise o Dilation o Reorg...

Page 6: ...YOLO SSD MobileNet FPN etc The DPU IP can be integrated as a block in the programmable logic PL of the selected Zynq 7000 SoC and Zynq UltraScale MPSoC devices with direct connections to the processi...

Page 7: ...driver which is included in the Xilinx Deep Neural Network Development Kit DNNDK toolchain You can download the free developer resources from the Xilinx website https www xilinx com products design to...

Page 8: ...ntation DPU Camera AXI Interconnect Controller DDR ARM R5 DisplayPort USB3 0 SATA3 1 PCIe Gen2 GigE USB2 0 UART SPI Quad SPI NAND SD demosaic gamma Color_ conversion DMA AXI Interconnect AXI Interconn...

Page 9: ...Loader Operating System Host CPU Deep Learning App DPU accelerated Profiler Libarary DPU Driver DPU User Space Kernel Space Hardware Platform X22331 022019 Figure 5 Application Execution Hierarchy Lic...

Page 10: ...sed to buffer the intermediate data input and output data The data is reused as much as possible to reduce the memory bandwidth Deep pipelined design is used for the computing engine Like other accele...

Page 11: ...performance achieved with the device Therefore two input clocks for DPU is needed one for general logic and the other for DSP slices The difference between DPU and DPU_EU is shown in Figure 7 All DPU...

Page 12: ...tive Low reset for DSP unit m_axi_dpu_aclk Clock 1 I Input clock used for DPU general logic m_axi_dpu_aresetn Reset 1 I Active Low reset for DPU general logic DPUx_M_AXI_INSTR Memory mapped AXI master...

Page 13: ...nals are active High The details of reg_dpu_reset is shown in Table 2 Table 2 Reg_dpu_reset Register Address Offset Width Type Description Reg_dpu_reset 0x004 32 R W 0 reset of DPU core 0 1 reset of D...

Page 14: ...th Type Description Reg_dpu0_instr_addr 0x20c 32 R W 0 The instruction start address in external memory for DPU core0 Reg_dpu1_instr_addr 0x30c 32 R W 0 The instruction start address in external memor...

Page 15: ...ase address4 of DPU core0 Reg_dpu0_base_addr5_l 0x24C 32 R W The lower 32 bits of the base address5 of DPU core0 Reg_dpu0_base_addr5_h 0x250 32 R W The lower 8 bits in the register represent the upper...

Page 16: ...R W The lower 8 bits in the register represent the upper 8 bits of the base address1 of DPU core2 Reg_dpu2_base_addr2_l 0x434 32 R W The lower 32 bits of the base address2 of DPU core2 Reg_dpu2_base_a...

Page 17: ...determined by the number of DPU cores When the parameter of DPU_NUM is set to 2 it means the DPU IP is integrated with two DPU cores and the data width of the dpu_interrupt signal is two bits The lowe...

Page 18: ...ing table Table 7 Deep Neural Network Features and Parameters Supported by DPU Features Description Convolution Kernel Sizes W 1 16 H 1 16 Strides W 1 4 H 1 4 Padding_w 1 kernel_w 1 Padding_h 1 kernel...

Page 19: ...bitrary Notes 1 The parameter channel_parallel is determined by the DPU configuration For example the channel_parallel of DPU B1152 is 12 the channel_parallel of DPU B4096 is 16 Configuration Options...

Page 20: ...ifferent programmable logic resource The larger convolution architecture can achieve higher performance with more resources The parallelism for different convolution architecture is listed in Table 8...

Page 21: ...d low DSP usage is shown in Table 9 The data is tested on the Xilinx ZCU102 platform without Depthwise Conv Average Pooling Relu6 and Leaky Relu features Table 9 Resources of Different DSP Usage High...

Page 22: ...The final utilization is shown in Figure 11 Figure 11 Summary Page of DPU Configuration DPU Performance on Different Devices Table 10 shows the peak performance of the DPU on different devices Table 1...

Page 23: ...dwidth requirements for some neural networks averaged by layer have been tested with one DPU core running at full speed The peak and average I O bandwidth requirements of three different neural networ...

Page 24: ...ure 12 shows the three clock domains PL s_axi_clk DPU Register Configure Data Controller Calculation Unit m_axi_dpu_aclk dpu_2x_aclk X22334 022019 Figure 12 Clock Domain in DPU Register Clock The inpu...

Page 25: ...axi_dpu_aclk and the two clocks must be synchronous to meet the timing closure The recommended circuit design is shown in Figure 13 MMCM RST CLKIN CLKOUT BUFGCE_DIV CE CLR I O BUFGCE_DIV_CLK2_INST dpu...

Page 26: ...be set to Auto Figure 14 Recommended Clocking Options of Clock Wizard Matched Routing Select the Matched Routing for the m_axi_dpu_aclk and dpu_2x_clk in the Output Clocks tab of the Clock Wizard IP W...

Page 27: ...t You must guarantee each pair of clocks and resets is generated in a synchronous clock domain If the related clocks and resets are not matched the DPU might not work properly A recommended solution i...

Page 28: ...ository Add DPU IP into Block Design Configure DPU Parameters Connect DPU with a Processing System in the Xilinx SoC Assign Register Address for DPU Generate Bitstream Generate BOOT BIN Add DPU IP int...

Page 29: ...rch 26 2019 Figure 18 DPU IP in Repository Add DPU IP into Block Design Search DPU IP in the block design interface and add DPU IP into the block design The procedure is shown in Figure 19 and Figure...

Page 30: ...The number of master interfaces in the DPU IP depends on the DPU_NUM parameter You can connect the DPU to a processing system PS with any kind of interconnections You must ensure the DPU can correctly...

Page 31: ...stom system with the pre built Linux environment in the DNNDK package the DPU slave interface must be connected to the M_AXI_HPM0_LPD PS Master and the DPU base address must be set to 0x8F00_000 with...

Page 32: ...itstream in Vivado shown in Figure 24 Figure 24 Generate Bitstream Generate BOOT BIN You can use the Vivado SDK or PetaLinux to generate the BOOT BIN file For boot image creation using the Vivado SDK...

Page 33: ...ures Hardware Design Flow gives an overview of how to use Xilinx Vivado Design Suite to generate the reference hardware design Software Design Flow describes the design flow of project creation in the...

Page 34: ...more information about the DNNDK package refer to the DNNDK User Guide UG1327 Requirements The following summarizes the requirements of the TRD Target platforms ZCU102 evaluation board production sil...

Page 35: ...n DPU IP Product Guide www xilinx com 35 PG338 v1 2 March 26 2019 Design Files Design files are in the following directory structure Figure 26 Directory Structure Note DPU_IP is in the pl srcs dpu_ip...

Page 36: ...file The parameters of DPU IP in the reference design are configured accordingly Both the connections of the DPU interrupt and the assignment addresses for DPU in the reference design should not be m...

Page 37: ...o build the reference Vivado project with Vivado 2018 2 For information about setting up your Vivado environment refer to the Vivado Design Suite User Guide UG910 Building the hardware design consists...

Page 38: ...U IP Product Guide www xilinx com 38 PG338 v1 2 March 26 2019 Figure 29 TRD Block Design 4 In the GUI click Generate Bitstream to generate the bit file as shown in the following figure Figure 30 Gener...

Page 39: ...wing figure Figure 31 DPU Configuration Page Those parameters of DPU can be configured in case of different resource requirements For more information about DPU and its parameters refer to Chapter 3 D...

Page 40: ...ilt path to TRD_HOME pl prj zcu102 sdk Create BOOT BIN Use the following to create the BOOT BIN file cd images linux petalinux package boot fsbl zynqmp_fsbl elf u boot u boot elf pmufw pmufw elf fpga...

Page 41: ...e resnet50 directory on the SD card for example home resnet50 If the directory does not exist create a new directory 4 Insert the SD card into the ZCU102 and boot up the board After the Linux boot run...

Page 42: ...Chapter 6 Example Design DPU IP Product Guide www xilinx com 42 PG338 v1 2 March 26 2019 Figure 32 Running Results Send Feedback...

Page 43: ...cly display the Materials without prior written consent Certain products are subject to the terms and conditions of Xilinx s limited warranty please refer to Xilinx s Terms of Sale which can be viewed...

Reviews: