Efficient Neural Vision Systems Based on Convolutional Image Acquisition

Despite the substantial progress made in deep learning in recent years, advanced approaches remain computationally intensive. The trade-off between accuracy and computation time and energy limits their use in real-time applications on low power and other resource-constrained systems. In this paper, we tackle this fundamental challenge by introducing a hybrid optical-digital implementation of a convolutional neural network (CNN) based on engineering of the point spread function (PSF) of an optical imaging system. This is done by coding an imaging aperture such that its PSF replicates a large convolution kernel of the first layer of a pre-trained CNN. As the convolution takes place in the optical domain, it has zero cost in terms of energy consumption and has zero latency independent of the kernel size. Experimental results on two datasets demonstrate that our approach yields more than two orders of magnitude reduction in the computational cost while achieving near-state-of-the-art accuracy, or equivalently, better accuracy at the same computational cost.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Image Classification EMNIST-Balanced OptConv+Log+Perc Accuracy 87.69 # 5
Image Classification EMNIST-Digits OptConv+Log+Perc Accuracy (%) 99.43 # 4
Image Classification EMNIST-Letters OptConv+Log+Perc Accuracy 93.65 # 6
Hand-Gesture Recognition InAirGestures OptConv+Log+Perc Accuracy (%) 99.94 # 1

Methods