no code implementations • 25 Dec 2023 • Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu
Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data.
Ranked #1 on Audio Generation on AudioCaps
no code implementations • 25 Oct 2023 • Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu
Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data.
no code implementations • 22 Sep 2023 • Jiamin Xie, Ke Li, Jinxi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli
In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in sparse monolingual models or a sparse multilingual model (named as Dynamic ASR Pathways).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
3 code implementations • arXiv 2023 • Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli
Expanding the language coverage of speech technology has the potential to improve access to information for many more people.
no code implementations • 8 Jan 2023 • Heli Qi, Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
This paper introduces SpeeChain, an open-source Pytorch-based toolkit designed to develop the machine speech chain for large-scale use.
no code implementations • 23 Nov 2022 • Mumin Jin, Prashant Serai, JiLong Wu, Andros Tjandra, Vimal Manohar, Qing He
Most people who have tried to learn a foreign language would have experienced difficulties understanding or speaking with a native speaker's accent.
no code implementations • 10 Nov 2022 • Andros Tjandra, Nayan Singhal, David Zhang, Ozlem Kalinli, Abdelrahman Mohamed, Duc Le, Michael L. Seltzer
Later, we use our optimal tokenization strategy to train multiple embedding and output model to further improve our result.
no code implementations • 13 Sep 2022 • Mu Yang, Andros Tjandra, Chunxi Liu, David Zhang, Duc Le, Ozlem Kalinli
Neural network pruning compresses automatic speech recognition (ASR) models effectively.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 29 Mar 2022 • Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji, Andros Tjandra, Sakriani Sakti
We present Nix-TTS, a lightweight TTS achieved via knowledge distillation to a high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) TTS teacher model.
2 code implementations • 17 Nov 2021 • Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7. 4 BLEU over 21 translation directions into English.
Ranked #1 on Language Identification on VoxLingua107 (using extra training data)
no code implementations • 14 Oct 2021 • Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf
While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks.
Ranked #5 on Audio Classification on Balanced Audio Set
no code implementations • 7 Oct 2021 • Jialu Li, Vimal Manohar, Pooja Chitkara, Andros Tjandra, Michael Picheny, Frank Zhang, Xiaohui Zhang, Yatharth Saraf
Domain-adversarial training (DAT) and multi-task learning (MTL) are two common approaches for building accent-robust ASR models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Jul 2021 • Andros Tjandra, Diptanu Gon Choudhury, Frank Zhang, Kritika Singh, Alexis Conneau, Alexei Baevski, Assaf Sela, Yatharth Saraf, Michael Auli
Language identification greatly impacts the success of downstream tasks such as automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 4 Nov 2020 • Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
By contrast, humans can listen to what hey speak in real-time, and if there is a delay in hearing, they won't be able to continue speaking.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 4 Nov 2020 • Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Previous research has proposed a machine speech chain to enable automatic speech recognition (ASR) and text-to-speech synthesis (TTS) to assist each other in semi-supervised learning and to avoid the need for a large amount of paired speech and text data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • LREC 2020 • Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
We then develop ASR and TTS of ethnic languages by utilizing Indonesian ASR and TTS in a cross-lingual machine speech chain framework with only text or only speech data removing the need for paired speech-text data of those ethnic languages.
no code implementations • 4 Nov 2020 • Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
One main reason is because the model needs to decide the incremental steps and learn the transcription that aligns with the current short speech segment.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 24 Oct 2020 • Andros Tjandra, Ruoming Pang, Yu Zhang, Shigeki Karita
We present an approach for unsupervised learning of speech representation disentangling contents and styles.
no code implementations • 24 May 2020 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we report our submitted system for the ZeroSpeech 2020 challenge on Track 2019.
1 code implementation • 23 Oct 2019 • Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig
As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function.
no code implementations • 22 Oct 2019 • Yongqiang Wang, Abdel-rahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer
We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition.
Ranked #23 on Speech Recognition on LibriSpeech test-other (using extra training data)
no code implementations • 2 Oct 2019 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Second, we train a sequence-to-sequence model that directly maps the source language speech to the target language's discrete representation.
no code implementations • 3 Jun 2019 • Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Previously, a machine speech chain, which is based on sequence-to-sequence deep learning, was proposed to mimic speech perception and production behavior.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • 27 May 2019 • Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura
Our proposed approach significantly improved the intelligibility (in CER), the MOS, and discrimination ABX scores compared to the official ZeroSpeech 2019 baseline or even the topline.
no code implementations • 31 Oct 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In our previous work, we applied a speech chain mechanism as a semi-supervised learning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 22 Jul 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we propose two ideas to improve sequence-to-sequence model performance by enhancing the attention module.
no code implementations • 28 Mar 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In the speech chain loop mechanism, ASR also benefits from the ability to further learn an arbitrary speaker's characteristics from the generated speech waveform, resulting in a significant improvement in the recognition rate.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 28 Feb 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In the machine learning fields, Recurrent Neural Network (RNN) has become a popular architecture for sequential data modeling.
no code implementations • 30 Oct 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 22 Sep 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we construct the first end-to-end attention-based encoder-decoder model to process directly from raw speech waveform to the text transcription.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 16 Jul 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we take a step further and develop a closed-loop speech chain model based on deep learning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 7 Jun 2017 • Andros Tjandra, Sakriani Sakti, Ruli Manurung, Mirna Adriani, Satoshi Nakamura
Our proposed RNNs, which are called a Long-Short Term Memory Recurrent Neural Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the tensor product.
no code implementations • 23 May 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Recurrent Neural Network (RNN) are a popular choice for modeling temporal and sequential tasks and achieve many state-of-the-art performance on various complex problems.
no code implementations • IJCNLP 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we propose a novel attention mechanism that has local and monotonic properties.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4