WaveMix: A Resource-efficient Neural Network for Image Analysis

28 May 2022  ·  Pranav Jeevan, Kavitha Viswanathan, Anandu A S, Amit Sethi ·

We propose a novel neural architecture for computer vision -- WaveMix -- that is resource-efficient and yet generalizable and scalable. While using fewer trainable parameters, GPU RAM, and computations, WaveMix networks achieve comparable or better accuracy than the state-of-the-art convolutional neural networks, vision transformers, and token mixers for several tasks. This efficiency can translate to savings in time, cost, and energy. To achieve these gains we used multi-level two-dimensional discrete wavelet transform (2D-DWT) in WaveMix blocks, which has the following advantages: (1) It reorganizes spatial information based on three strong image priors -- scale-invariance, shift-invariance, and sparseness of edges -- (2) in a lossless manner without adding parameters, (3) while also reducing the spatial sizes of feature maps, which reduces the memory and time required for forward and backward passes, and (4) expanding the receptive field faster than convolutions do. The whole architecture is a stack of self-similar and resolution-preserving WaveMix blocks, which allows architectural flexibility for various tasks and levels of resource availability. WaveMix establishes new benchmarks for segmentation on Cityscapes; and for classification on Galaxy 10 DECals, Places-365, five EMNIST datasets, and iNAT-mini and performs competitively on other benchmarks. Our code and trained models are publicly available.

PDF Abstract

Results from the Paper


 Ranked #1 on Image Classification on Galaxy10 DECals (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Classification Caltech-256 WaveMixLite-256/7 Accuracy 54.62 # 5
Image Classification CIFAR-10 WaveMixLite-144/7 Percentage correct 97.29 # 83
Image Classification CIFAR-100 WaveMixLite-256/7 Percentage correct 85.09 # 69
Image Classification CIFAR-100 WaveMix-Lite-256/7 Percentage correct 70.20 # 164
Semantic Segmentation Cityscapes val WaveMix mIoU 82.7 # 28
Semantic Segmentation Cityscapes val WaveMix-256/16 (Level-4) mIoU 82.60 # 30
Image Classification EMNIST-Balanced WaveMixLite-128/7 Accuracy 91.06 # 1
Image Classification EMNIST-Byclass WaveMixLite-128/7 Accuracy 88.43 # 1
Image Classification EMNIST-Bymerge WaveMixLite-128/16 Accuracy 91.80 # 1
Image Classification EMNIST-Digits WaveMixLite-112/16 Accuracy (%) 99.82 # 1
Image Classification EMNIST-Letters WaveMixLite-112/16 Accuracy 95.96 # 1
Image Classification Fashion-MNIST WaveMixLite Percentage error 5.68 # 8
Image Classification Galaxy10 DECals WaveMix Top-1 Accuracy (%) 95.42 # 1
PARAMS (M) 28 # 1
Image Classification ImageNet WaveMix-192/16 (level 3) Top 1 Accuracy 74.93% # 892
Image Classification iNat2021-mini WaveMix-256/16 (level 2) Top 1 Accuracy 61.75 # 1
Image Classification mnist WaveMixLite Percentage error 0.25 # 1
Scene Classification Places365-Standard WaveMix Top 1 Error 43.55 # 1
Image Classification Places365-Standard WaveMix-240/12 (level 4) Top 1 Accuracy 56.45 # 4
Image Classification STL-10 WaveMixLite-256/7 Percentage correct 70.88 # 90
Image Classification SVHN WaveMixLite-144/15 Percentage error 1.27 # 6
Image Classification Tiny ImageNet Classification WaveMixLite-144/7 Validation Acc 77.47% # 9

Methods