Search Results for author: Robin Scheibler

Found 19 papers, 10 papers with code

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

1 code implementation • 27 Oct 2023 • Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

TorchAudio is an open-source audio and speech processing library built for PyTorch.

Self-Supervised Learning Speech Enhancement +2

2,379

Paper
Code

Neural Diarization with Non-autoregressive Intermediate Attractors

1 code implementation • 13 Mar 2023 • Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa

The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.

speaker-diarization Speaker Diarization

348

Paper
Code

Diffusion-based Generative Speech Source Separation

1 code implementation • 31 Oct 2022 • Robin Scheibler, Youna Ji, Soo-Whan Chung, Jaeuk Byun, Soyeon Choe, Min-Seok Choi

We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE).

Speech Enhancement

Paper
Code

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

1 code implementation • 19 Jul 2022 • Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe

To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

7,880

Paper
Code

End-to-End Multi-speaker ASR with Independent Vector Analysis

no code implementations • 1 Apr 2022 • Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, Yanmin Qian

We develop an end-to-end system for multi-channel, multi-speaker automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Spatial Loss for Unsupervised Multi-channel Source Separation

no code implementations • 1 Apr 2022 • Kohei Saijo, Robin Scheibler

With the proposed loss, we train the neural separators based on minimum variance distortionless response (MVDR) beamforming and independent vector analysis (IVA).

blind source separation

Paper
Add Code

MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition

no code implementations • 17 Feb 2022 • Jin Sakuma, Tatsuya Komatsu, Robin Scheibler

We propose multi-layer perceptron (MLP)-based architectures suitable for variable length input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Independence-based Joint Dereverberation and Separation with Neural Source Model

no code implementations • 13 Oct 2021 • Kohei Saijo, Robin Scheibler

We introduce a neural network in the framework of time-decorrelation iterative source steering, which is an extension of independent vector analysis to joint dereverberation and separation.

Speech Dereverberation

Paper
Add Code

SDR -- Medium Rare with Fast Computations

1 code implementation • 13 Oct 2021 • Robin Scheibler

The complexity of this step is thus reduced by a factor quadratic in the distortion filter size used in bss eval, usually 512.

126

Paper
Code

MLP-based architecture with variable length input for automatic speech recognition

no code implementations • 29 Sep 2021 • Jin Sakuma, Tatsuya Komatsu, Robin Scheibler

We propose three approaches to extend MLP-based architectures for use with sequences of arbitrary length.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Refinement of Direction of Arrival Estimators by Majorization-Minimization Optimization on the Array Manifold

1 code implementation • 2 Jun 2021 • Robin Scheibler, Masahito Togami

We propose a generalized formulation of direction of arrival estimation that includes many existing methods such as steered response power, subspace, coherent and incoherent, as well as speech sparsity-based methods.

Direction of Arrival Estimation

Paper
Code

Joint Dereverberation and Separation with Iterative Source Steering

no code implementations • 12 Feb 2021 • Taishi Nakashima, Robin Scheibler, Masahito Togami, Nobutaka Ono

In this case, we manage to reduce the number of matrix inversion to only one per iteration and source.

blind source separation

Paper
Add Code

Surrogate Source Model Learning for Determined Source Separation

no code implementations • 11 Nov 2020 • Robin Scheibler, Masahito Togami

We find that the learnt approximate surrogate generalizes well on mixtures of three and four speakers without any modification.

Speech Separation

Paper
Add Code

Generalized Minimal Distortion Principle for Blind Source Separation

2 code implementations • 11 Sep 2020 • Robin Scheibler

The method thus provides a cheap and easy way to boost the performance of blind source separation.

Audio and Speech Processing Sound Signal Processing

Paper
Code

Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization

no code implementations • 23 Aug 2020 • Robin Scheibler

Numerical experiments demonstrate the effectiveness of the proposed method.

blind source separation Speech Separation

Paper
Add Code

Independent Vector Analysis with more Microphones than Sources

1 code implementation • 20 May 2019 • Robin Scheibler, Nobutaka Ono

The performance of the algorithm is assessed on simulated signals.

Sound Audio and Speech Processing

Paper
Code

Multi-modal Blind Source Separation with Microphones and Blinkies

2 code implementations • 4 Apr 2019 • Robin Scheibler, Nobutaka Ono

We show that alternating updates similar to those of independent vector analysis and Itakura-Saito non-negative matrix factorization decrease the negative log-likelihood of the joint distribution.

Sound Audio and Speech Processing

Paper
Code

Pyroomacoustics: A Python package for audio room simulations and array processing algorithms

2 code implementations • 11 Oct 2017 • Robin Scheibler, Eric Bezzam, Ivan Dokmanić

We present pyroomacoustics, a software package aimed at the rapid development and testing of audio array processing algorithms.

Sound Audio and Speech Processing

1,326

Paper
Code

A Fast Hadamard Transform for Signals with Sub-linear Sparsity in the Transform Domain

no code implementations • 7 Oct 2013 • Robin Scheibler, Saeid Haghighatshoar, Martin Vetterli

A new iterative low complexity algorithm has been presented for computing the Walsh-Hadamard transform (WHT) of an $N$ dimensional signal with a $K$-sparse WHT, where $N$ is a power of two and $K = O(N^\alpha)$, scales sub-linearly in $N$ for some $0 < \alpha < 1$.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.