FSPEN: AN ULTRA-LIGHTWEIGHT NETWORK FOR REAL TIME SPEECH ENAHNCMENT

Deep learning-based speech enhancement methods have shown promising result in recent years. However, in practical applications, the model size and computational complexity are important factors that limit their use in end-products. Therefore, in products that require real-time speech enhancement with limited resources, such as TWS headsets, hearing aids, IoT devices, etc., ultra-lightweight models are necessary. In this paper, an ultra-lightweight network FSPEN is proposed for real-time speech enhancement task. We propose a full-band and sub-band network structure for extracting global and local features, and an inter-frame path extension method that can enhance network modeling capacity while preserving complexity. Experiments demonstrate that the proposed FSPEN achieves a performance of PESQ 2.97 on the VoiceBank+Demand dataset at 89M multiply-accumulate operation per second (MAC) and 79k parameters.

PDF
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Speech Enhancement VoiceBank + DEMAND FSPEN PESQ 2.97 # 17
STOI 0.942 # 10
Para. (M) 0.079 # 1

Methods


No methods listed for this paper. Add relevant methods here