Deep Ad-hoc Beamforming

3 Nov 2018  ·  Xiao-Lei Zhang ·

Although deep learning based speech enhancement methods have demonstrated good performance in adverse acoustic environments, their performance is strongly affected by the distance between the speech source and the microphones since speech signals fade quickly during the propagation. To address the above problem, we propose \textit{deep ad-hoc beamforming}---a deep-learning-based multichannel speech enhancement method with an ad-hoc microphone array. It serves for scenarios where the microphones are placed randomly in a room and work collaboratively. It aims to pick up speech signals with equally high quality in a range where the array covers. Its core idea is to reweight the estimated speech signals with a sparsity constraint when conducting adaptive beamforming, where the weights produced by a neural network are an estimation of the propagation cost from the speech source to the ad-hoc microphone array, e.g. signal-to-noise ratios, and the sparsity constraint is to filter out the microphones that are too far away from both the speech source and the majority of the ad-hoc microphone array. We conducted an extensive experiment in a scenario where the location of the speech source is far-field, random, and blind to the microphones. Results show that our method outperforms representative deep-learning-based speech enhancement methods by a large margin.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper