A CNN Accelerator on FPGA Using Depthwise Separable Convolution

3 Sep 2018  ·  Lin Bai, Yiming Zhao, Xinming Huang ·

Convolutional neural networks (CNNs) have been widely deployed in the fields of computer vision and pattern recognition because of their high accuracy. However, large convolution operations are computing-intensive that often requires a powerful computing platform such as Graphics Processing Unit (GPU). This makes it difficult to apply CNNs to portable devices. The state-of-the-art CNNs, such as MobileNetV2 and Xception, adopt depthwise separable convolution to replace the standard convolution for embedded platforms. That significantly reduces operations and parameters with only limited loss in accuracy. This highly structured model is very suitable for Field-Programmable Gate Array (FPGA) implementation. In this paper, a scalable high performance depthwise separable convolution optimized CNN accelerator is proposed. The accelerator can be fit into an FPGA of different sizes, provided the balancing between hardware resources and processing speed. As an example, MobileNetV2 is implemented on Arria 10 SoC FPGA, and the results show this accelerator can classify each picture from ImageNet in 3.75ms, which is about 266.6 frames per second. This achieves 20x speedup if compared to CPU.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here