1 code implementation • 27 Oct 2023 • Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis
TorchAudio is an open-source audio and speech processing library built for PyTorch.
1 code implementation • 13 Mar 2023 • Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa
The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.
1 code implementation • 31 Oct 2022 • Robin Scheibler, Youna Ji, Soo-Whan Chung, Jaeuk Byun, Soyeon Choe, Min-Seok Choi
We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE).
1 code implementation • 19 Jul 2022 • Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe
To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 1 Apr 2022 • Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, Yanmin Qian
We develop an end-to-end system for multi-channel, multi-speaker automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 1 Apr 2022 • Kohei Saijo, Robin Scheibler
With the proposed loss, we train the neural separators based on minimum variance distortionless response (MVDR) beamforming and independent vector analysis (IVA).
no code implementations • 17 Feb 2022 • Jin Sakuma, Tatsuya Komatsu, Robin Scheibler
We propose multi-layer perceptron (MLP)-based architectures suitable for variable length input.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 13 Oct 2021 • Robin Scheibler
The complexity of this step is thus reduced by a factor quadratic in the distortion filter size used in bss eval, usually 512.
no code implementations • 13 Oct 2021 • Kohei Saijo, Robin Scheibler
We introduce a neural network in the framework of time-decorrelation iterative source steering, which is an extension of independent vector analysis to joint dereverberation and separation.
no code implementations • 29 Sep 2021 • Jin Sakuma, Tatsuya Komatsu, Robin Scheibler
We propose three approaches to extend MLP-based architectures for use with sequences of arbitrary length.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 2 Jun 2021 • Robin Scheibler, Masahito Togami
We propose a generalized formulation of direction of arrival estimation that includes many existing methods such as steered response power, subspace, coherent and incoherent, as well as speech sparsity-based methods.
no code implementations • 12 Feb 2021 • Taishi Nakashima, Robin Scheibler, Masahito Togami, Nobutaka Ono
In this case, we manage to reduce the number of matrix inversion to only one per iteration and source.
no code implementations • 11 Nov 2020 • Robin Scheibler, Masahito Togami
We find that the learnt approximate surrogate generalizes well on mixtures of three and four speakers without any modification.
2 code implementations • 11 Sep 2020 • Robin Scheibler
The method thus provides a cheap and easy way to boost the performance of blind source separation.
Audio and Speech Processing Sound Signal Processing
no code implementations • 23 Aug 2020 • Robin Scheibler
Numerical experiments demonstrate the effectiveness of the proposed method.
1 code implementation • 20 May 2019 • Robin Scheibler, Nobutaka Ono
The performance of the algorithm is assessed on simulated signals.
Sound Audio and Speech Processing
2 code implementations • 4 Apr 2019 • Robin Scheibler, Nobutaka Ono
We show that alternating updates similar to those of independent vector analysis and Itakura-Saito non-negative matrix factorization decrease the negative log-likelihood of the joint distribution.
Sound Audio and Speech Processing
2 code implementations • 11 Oct 2017 • Robin Scheibler, Eric Bezzam, Ivan Dokmanić
We present pyroomacoustics, a software package aimed at the rapid development and testing of audio array processing algorithms.
Sound Audio and Speech Processing
no code implementations • 7 Oct 2013 • Robin Scheibler, Saeid Haghighatshoar, Martin Vetterli
A new iterative low complexity algorithm has been presented for computing the Walsh-Hadamard transform (WHT) of an $N$ dimensional signal with a $K$-sparse WHT, where $N$ is a power of two and $K = O(N^\alpha)$, scales sub-linearly in $N$ for some $0 < \alpha < 1$.