Search Results for author: Andros Tjandra

Found 34 papers, 5 papers with code

Audiobox: Unified Audio Generation with Natural Language Prompts

no code implementations • 25 Dec 2023 • Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu

Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data.

Ranked #1 on Audio Generation on AudioCaps

AudioCaps Audio Generation +1

Paper
Add Code

Generative Pre-training for Speech with Flow Matching

no code implementations • 25 Oct 2023 • Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data.

Speech Enhancement Speech Synthesis +1

Paper
Add Code

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

no code implementations • 22 Sep 2023 • Jiamin Xie, Ke Li, Jinxi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in sparse monolingual models or a sparse multilingual model (named as Dynamic ASR Pathways).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Scaling Speech Technology to 1,000+ Languages

3 code implementations • arXiv 2023 • Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Expanding the language coverage of speech technology has the potential to improve access to information for many more people.

Automatic Speech Recognition Language Identification +4

29,265

Paper
Code

SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain

no code implementations • 8 Jan 2023 • Heli Qi, Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

This paper introduces SpeeChain, an open-source Pytorch-based toolkit designed to develop the machine speech chain for large-scale use.

Data Augmentation

Paper
Add Code

Voice-preserving Zero-shot Multiple Accent Conversion

no code implementations • 23 Nov 2022 • Mumin Jin, Prashant Serai, JiLong Wu, Andros Tjandra, Vimal Manohar, Qing He

Most people who have tried to learn a foreign language would have experienced difficulties understanding or speaking with a native speaker's accent.

Paper
Add Code

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

no code implementations • 10 Nov 2022 • Andros Tjandra, Nayan Singhal, David Zhang, Ozlem Kalinli, Abdelrahman Mohamed, Duc Le, Michael L. Seltzer

Later, we use our optimal tokenization strategy to train multiple embedding and output model to further improve our result.

Paper
Add Code

Learning ASR pathways: A sparse multilingual ASR model

no code implementations • 13 Sep 2022 • Mu Yang, Andros Tjandra, Chunxi Liu, David Zhang, Duc Le, Ozlem Kalinli

Neural network pruning compresses automatic speech recognition (ASR) models effectively.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation

1 code implementation • 29 Mar 2022 • Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji, Andros Tjandra, Sakriani Sakti

We present Nix-TTS, a lightweight TTS achieved via knowledge distillation to a high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) TTS teacher model.

Knowledge Distillation Neural Architecture Search

221

Paper
Code

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

2 code implementations • 17 Nov 2021 • Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7. 4 BLEU over 21 translation directions into English.

Ranked #1 on Language Identification on VoxLingua107 (using extra training data)

Language Identification Representation Learning +3

29,264

Paper
Code

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

no code implementations • 14 Oct 2021 • Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf

While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks.

Ranked #5 on Audio Classification on Balanced Audio Set

Audio Classification Representation Learning +1

Paper
Add Code

Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings

no code implementations • 7 Oct 2021 • Jialu Li, Vimal Manohar, Pooja Chitkara, Andros Tjandra, Michael Picheny, Frank Zhang, Xiaohui Zhang, Yatharth Saraf

Domain-adversarial training (DAT) and multi-task learning (MTL) are two common approaches for building accent-robust ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Improved Language Identification Through Cross-Lingual Self-Supervised Learning

no code implementations • 8 Jul 2021 • Andros Tjandra, Diptanu Gon Choudhury, Frank Zhang, Kritika Singh, Alexis Conneau, Alexei Baevski, Assaf Sela, Yatharth Saraf, Michael Auli

Language identification greatly impacts the success of downstream tasks such as automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time

no code implementations • 4 Nov 2020 • Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura

By contrast, humans can listen to what hey speak in real-time, and if there is a delay in hearing, they won't be able to continue speaking.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework

no code implementations • 4 Nov 2020 • Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Previous research has proposed a machine speech chain to enable automatic speech recognition (ASR) and text-to-speech synthesis (TTS) to assist each other in semi-supervised learning and to avoid the need for a large amount of paired speech and text data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis

no code implementations • LREC 2020 • Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

We then develop ASR and TTS of ethnic languages by utilizing Indonesian ASR and TTS in a cross-lingual machine speech chain framework with only text or only speech data removing the need for paired speech-text data of those ethnic languages.

Machine Translation speech-recognition +3

Paper
Add Code

Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition

no code implementations • 4 Nov 2020 • Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

One main reason is because the model needs to decide the incremental steps and learn the transcription that aligns with the current short speech segment.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Unsupervised Learning of Disentangled Speech Content and Style Representation

no code implementations • 24 Oct 2020 • Andros Tjandra, Ruoming Pang, Yu Zhang, Shigeki Karita

We present an approach for unsupervised learning of speech representation disentangling contents and styles.

Speaker Recognition

Paper
Add Code

Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge

no code implementations • 24 May 2020 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In this paper, we report our submitted system for the ZeroSpeech 2020 challenge on Track 2019.

Speech Synthesis

Paper
Add Code

Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

1 code implementation • 23 Oct 2019 • Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig

As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function.

Paper
Code

Transformer-based Acoustic Modeling for Hybrid Speech Recognition

no code implementations • 22 Oct 2019 • Yongqiang Wang, Abdel-rahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer

We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition.

Ranked #23 on Speech Recognition on LibriSpeech test-other (using extra training data)

Language Modelling speech-recognition +1

Paper
Add Code

Speech-to-speech Translation between Untranscribed Unknown Languages

no code implementations • 2 Oct 2019 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Second, we train a sequence-to-sequence model that directly maps the source language speech to the target language's discrete representation.

Speech-to-Speech Translation Translation

Paper
Add Code

Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain

no code implementations • 3 Jun 2019 • Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Previously, a machine speech chain, which is based on sequence-to-sequence deep learning, was proposed to mimic speech perception and production behavior.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019

no code implementations • 27 May 2019 • Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura

Our proposed approach significantly improved the intelligibility (in CER), the MOS, and discrimination ABX scores compared to the official ZeroSpeech 2019 baseline or even the topline.

Clustering

Paper
Add Code

End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator

no code implementations • 31 Oct 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In our previous work, we applied a speech chain mechanism as a semi-supervised learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Multi-scale Alignment and Contextual History for Attention Mechanism in Sequence-to-sequence Model

no code implementations • 22 Jul 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In this paper, we propose two ideas to improve sequence-to-sequence model performance by enhancing the attention module.

Sequence-To-Sequence Speech Recognition speech-recognition

Paper
Add Code

Machine Speech Chain with One-shot Speaker Adaptation

no code implementations • 28 Mar 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In the speech chain loop mechanism, ASR also benefits from the ability to further learn an arbitrary speaker's characteristics from the generated speech waveform, resulting in a significant improvement in the recognition rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Tensor Decomposition for Compressing Recurrent Neural Network

1 code implementation • 28 Feb 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In the machine learning fields, Recurrent Neural Network (RNN) has become a popular architecture for sequential data modeling.

Tensor Decomposition

Paper
Code

Sequence-to-Sequence ASR Optimization via Reinforcement Learning

no code implementations • 30 Oct 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Attention-based Wav2Text with Feature Transfer Learning

no code implementations • 22 Sep 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In this paper, we construct the first end-to-end attention-based encoder-decoder model to process directly from raw speech waveform to the text transcription.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Listening while Speaking: Speech Chain by Deep Learning

no code implementations • 16 Jul 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In this paper, we take a step further and develop a closed-loop speech chain model based on deep learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Gated Recurrent Neural Tensor Network

no code implementations • 7 Jun 2017 • Andros Tjandra, Sakriani Sakti, Ruli Manurung, Mirna Adriani, Satoshi Nakamura

Our proposed RNNs, which are called a Long-Short Term Memory Recurrent Neural Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the tensor product.

Language Modelling

Paper
Add Code

Compressing Recurrent Neural Network with Tensor Train

no code implementations • 23 May 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Recurrent Neural Network (RNN) are a popular choice for modeling temporal and sequential tasks and achieve many state-of-the-art performance on various complex problems.

Paper
Add Code

Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing

no code implementations • IJCNLP 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

In this paper, we propose a novel attention mechanism that has local and monotonic properties.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.