no code implementations • 1 Jun 2023 • Sarthak Yadav, Sergios Theodoridis, Lars Kai Hansen, Zheng-Hua Tan
In this work, we propose a Multi-Window Masked Autoencoder (MW-MAE) fitted with a novel Multi-Window Multi-Head Attention (MW-MHA) module that facilitates the modelling of local-global interactions in every decoder transformer block through attention heads of several distinct local and global windows.
no code implementations • 29 Mar 2022 • Sarthak Yadav, Neil Zeghidour
Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work.
no code implementations • 29 Sep 2021 • Sarthak Yadav, Mary Ellen Foster
The majority of recent work on the interpretability of audio and speech processing deep neural networks (DNNs) interprets spectral information modelled by the first layer, relying solely on visual means of interpretation.
no code implementations • 16 Oct 2019 • Sarthak Yadav, Atul Rai
Majority of the recent approaches for text-independent speaker recognition apply attention or similar techniques for aggregation of frame-level feature descriptors generated by a deep neural network (DNN) front-end.