1 code implementation • 21 Dec 2023 • Davide Berghi, Philip J. B. Jackson
The multichannel audio ``student'' network is trained to generate the same results.
1 code implementation • 14 Dec 2023 • Davide Berghi, Peipei Wu, Jinzheng Zhao, Wenwu Wang, Philip J. B. Jackson
Sound event localization and detection (SELD) combines two subtasks: sound event detection (SED) and direction of arrival (DOA) estimation.
no code implementations • 9 Aug 2023 • Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton
To address this issue, we (i) embed relative positional encoding in the self-attention mechanism and (ii) exploit multi-scale temporal relationships by designing a novel non hierarchical network, in contrast to the recent transformer-based approaches that use a hierarchical structure.
Ranked #1 on Action Detection on MultiTHUMOS
no code implementations • 27 Jul 2023 • Davide Berghi, Philip J. B. Jackson
This study considers the problem of detecting and locating an active talker's horizontal position from multichannel audio captured by a microphone array.
no code implementations • 4 Dec 2022 • Davide Berghi, Marco Volino, Philip J. B. Jackson
This is partly due to the lack of available datasets enabling audio-visual research in this direction.
no code implementations • 7 Mar 2022 • Davide Berghi, Adrian Hilton, Philip J. B. Jackson
We propose to generate weak labels using a pre-trained active speaker detector on pre-extracted face tracks.
1 code implementation • 14 Jun 2019 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley
Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available.
2 code implementations • 13 Jul 2016 • Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley
For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features.
no code implementations • 24 Jun 2016 • Yong Xu, Qiang Huang, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley
Compared with the conventional Gaussian Mixture Model (GMM) and support vector machine (SVM) methods, the proposed fully DNN-based method could well utilize the long-term temporal information with the whole chunk as the input.