Chapter 3: DPU Configuration
DPU IP Product Guide
23
PG338 (v1.2) March 26, 2019
Table 11: Performance of Different Models
Network Model
Workload
(Gops per
image)
Input Image
Resolution
Frame per second (FPS)
Inception-v1
3.2
224*224
405
ResNet50
7.7
224*224
175
SqueezeNet
0.698
224*224
1048
Tiny-YOLO
6.97
448*448
220
YOLO-V2
82.5
640*640
24
Pruned YOLO-V2
18.4
640*640
120
YOLO-V3
53.7
512*256
43
Pruned YOLO-V3-
4
512*256
115
Notes:
1.
The pruned models were generated by the Xilinx pruning tool.
I/O Bandwidth Requirements
When different neural networks run in the DPU, the I/O bandwidth requirement is different. Even the
I/O bandwidth requirement of different layers in one neural network are different. The I/O bandwidth
requirements for some neural networks, averaged by layer have been tested with one DPU core running
at full speed. The peak and average I/O bandwidth requirements of three different neural networks are
shown in Table 12. The table only shows the number of two commonly used DPU (B1152 and B4096).
Note that when multiple DPU cores run in parallel, each core might not be able to run at full speed due
to the limitation of I/O bandwidth.
Table 12: I/O Bandwidth Requirements for DPU-B1152 and DPU-B4096
Network Model
DPU-B1152
DPU-4096
Peak (MB/s)
Average (MB/s) Peak (MB/s)
Average (MB/s)
Inception-v1
1704
890
4626
2474
Resnet50
2052
1017
5298
3132
SSD
1516
684
5724
2049
Pruned YOLO-V3
2076
986
6453
3290
If one DPU core needs to run at full speed, the peak I/O bandwidth required shall be met. The I/O
bandwidth is mainly used for accessing data though the AXI master interfaces (Dpu0_M_AXI_DATA0 and
Dpu0_M_AXI_DATA1).