Search Results for author: Scott Wisdom

Found 26 papers, 5 papers with code

Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge

no code implementations • 2 Feb 2024 • Simon Leglaive, Matthieu Fraticelli, Hend ElGhazaly, Léonie Borne, Mostafa Sadeghi, Scott Wisdom, Manuel Pariente, John R. Hershey, Daniel Pressnitzer, Jon P. Barker

In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results.

Speech Enhancement Unsupervised Domain Adaptation

Paper
Add Code

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

no code implementations • 21 Aug 2023 • Hakan Erdogan, Scott Wisdom, Xuankai Chang, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey

The model operates on transcripts and audio token sequences and achieves multiple tasks through masking of inputs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

AudioSlots: A slot-centric generative model for audio separation

no code implementations • 9 May 2023 • Pradyumna Reddy, Scott Wisdom, Klaus Greff, John R. Hershey, Thomas Kipf

We discuss the results and limitations of our approach in detail, and further outline potential ways to overcome the limitations and directions for future work.

blind source separation Speech Separation

Paper
Add Code

AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation

no code implementations • 20 Jul 2022 • Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey

We identify several limitations of previous work on audio-visual on-screen sound separation, including the coarse resolution of spatio-temporal attention, poor convergence of the audio separation model, limited variety in training and evaluation data, and failure to account for the trade off between preservation of on-screen sounds and suppression of off-screen sounds.

Paper
Add Code

Text-Driven Separation of Arbitrary Sounds

no code implementations • 12 Apr 2022 • Kevin Kilgour, Beat Gfeller, Qingqing Huang, Aren Jansen, Scott Wisdom, Marco Tagliasacchi

The second model, SoundFilter, takes a mixed source audio clip as an input and separates it based on a conditioning vector from the shared text-audio representation defined by SoundWords, making the model agnostic to the conditioning modality.

Paper
Add Code

CycleGAN-Based Unpaired Speech Dereverberation

no code implementations • 29 Mar 2022 • Hannah Muckenhirn, Aleksandr Safin, Hakan Erdogan, Felix de Chaumont Quitry, Marco Tagliasacchi, Scott Wisdom, John R. Hershey

Typically, neural network-based speech dereverberation models are trained on paired data, composed of a dry utterance and its corresponding reverberant utterance.

Speech Dereverberation

Paper
Add Code

Improving Bird Classification with Unsupervised Sound Separation

no code implementations • 7 Oct 2021 • Tom Denton, Scott Wisdom, John R. Hershey

This paper addresses the problem of species classification in bird song recordings.

Classification

Paper
Add Code

DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement

no code implementations • 30 Jun 2021 • Yuma Koizumi, Shigeki Karita, Scott Wisdom, Hakan Erdogan, John R. Hershey, Llion Jones, Michiel Bacchiani

To make the model computationally feasible, we extend the Conformer using linear complexity attention and stacked 1-D dilated depthwise convolution layers.

Computational Efficiency Denoising +1

Paper
Add Code

Improving On-Screen Sound Separation for Open-Domain Videos with Audio-Visual Self-Attention

no code implementations • 17 Jun 2021 • Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey

We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos.

Unsupervised Pre-training

Paper
Add Code

Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation

no code implementations • 1 Jun 2021 • Scott Wisdom, Aren Jansen, Ron J. Weiss, Hakan Erdogan, John R. Hershey

The best performance is achieved using larger numbers of output sources, enabled by our efficient MixIT loss, combined with sparsity losses to prevent over-separation.

Paper
Add Code

Self-Supervised Learning from Automatically Separated Sound Scenes

1 code implementation • 5 May 2021 • Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings.

Contrastive Learning Self-Supervised Learning

Paper
Code

End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

no code implementations • 5 May 2021 • Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey

We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings.

Clustering Speaker Identification +1

Paper
Add Code

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

no code implementations • 3 Nov 2020 • Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey

Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

What's All the FUSS About Free Universal Sound Separation Data?

no code implementations • 2 Nov 2020 • Scott Wisdom, Hakan Erdogan, Daniel Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John Hershey

We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types.

Data Augmentation

Paper
Add Code

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

no code implementations • ICLR 2021 • Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

For evaluation and semi-supervised experiments, we collected human labels for presence of on-screen and off-screen sounds on a small subset of clips.

Scene Understanding

Paper
Add Code

Unsupervised Sound Separation Using Mixture Invariant Training

no code implementations • NeurIPS 2020 • Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron J. Weiss, Kevin Wilson, John R. Hershey

In such supervised approaches, a model is trained to predict the component sources from synthetic mixtures created by adding up isolated ground-truth sources.

Speech Enhancement Speech Separation +1

Paper
Add Code

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement

no code implementations • 18 Nov 2019 • Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey

This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation.

Speaker Separation Speech Enhancement +3

Paper
Add Code

Improving Universal Sound Separation Using Sound Classification

no code implementations • 18 Nov 2019 • Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis

Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification.

Audio Source Separation Classification +2

Paper
Add Code

Universal Sound Separation

no code implementations • 8 May 2019 • Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin Wilson, Jonathan Le Roux, John R. Hershey

For learnable bases, shorter windows (2. 5 ms) work best on all tasks.

Speech Enhancement Speech Separation

Paper
Add Code

Transfer Learning From Sound Representations For Anger Detection in Speech

no code implementations • 6 Feb 2019 • Mohamed Ezzeldin A. ElShaer, Scott Wisdom, Taniya Mishra

In this work, we train fully convolutional networks to detect anger in speech.

Transfer Learning

Paper
Add Code

Differentiable Consistency Constraints for Improved Deep Speech Enhancement

no code implementations • 20 Nov 2018 • Scott Wisdom, John R. Hershey, Kevin Wilson, Jeremy Thorpe, Michael Chinen, Brian Patton, Rif A. Saurous

Furthermore, the only previous approaches that apply mixture consistency use real-valued masks; mixture consistency has been ignored for complex-valued masks.

Sound Audio and Speech Processing

Paper
Add Code

SDR - half-baked or well done?

1 code implementation • 6 Nov 2018 • Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, John R. Hershey

In speech enhancement and source separation, signal-to-noise ratio is a ubiquitous objective measure of denoising/separation quality.

Sound Audio and Speech Processing

Paper
Code

Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding

1 code implementation • 21 Sep 2017 • Scott Wisdom, Thomas Powers, James Pitton, Les Atlas

This interpretability also provides principled initializations that enable faster training and convergence to better solutions compared to conventional random initialization.

Speech Separation

Paper
Code

Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery

1 code implementation • 22 Nov 2016 • Scott Wisdom, Thomas Powers, James Pitton, Les Atlas

Recurrent neural networks (RNNs) are powerful and effective for processing sequential data.

Compressive Sensing

Paper
Code

Full-Capacity Unitary Recurrent Neural Networks

2 code implementations • NeurIPS 2016 • Scott Wisdom, Thomas Powers, John R. Hershey, Jonathan Le Roux, Les Atlas

To address this question, we propose full-capacity uRNNs that optimize their recurrence matrix over all unitary matrices, leading to significantly improved performance over uRNNs that use a restricted-capacity recurrence matrix.

Ranked #25 on Sequential Image Classification on Sequential MNIST

Open-Ended Question Answering Sequential Image Classification

Paper
Code

Enhancement and Recognition of Reverberant and Noisy Speech by Extending Its Coherence

no code implementations • 2 Sep 2015 • Scott Wisdom, Thomas Powers, Les Atlas, James Pitton

Our approach centers around using a single-channel minimum mean-square error log-spectral amplitude (MMSE-LSA) estimator proposed by Habets, which scales coefficients in a time-frequency domain to suppress noise and reverberation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.