no code implementations • 20 Sep 2023 • Avihu Dekel, Slava Shechtman, Raul Fernandez, David Haws, Zvi Kons, Ron Hoory
Experimental results show that LLM2Speech maintains the teacher's quality while reducing the latency to enable natural conversations.
no code implementations • 28 Jul 2022 • Zvi Kons, Hagai Aronowitz, Edmilson Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas, George Saon
We propose using a recurrent neural network transducer (RNN-T)-based speech-to-text (STT) system as a common component that can be used for emotion recognition and language identification as well as for speech recognition.
no code implementations • 21 Feb 2022 • Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli, Ron Hoory, Brian Kingsbury
The NNSI reduces the need for manual labeling by automatically selecting highly-ambiguous samples and labeling them with high accuracy.
1 code implementation • 8 Apr 2021 • Samuel Thomas, Hong-Kwang J. Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory
We present a comprehensive study on building and adapting RNN transducer (RNN-T) models for spoken language understanding(SLU).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Oct 2020 • Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny
Assuming we have additional text-to-intent data (without speech) available, we investigated two techniques to improve the S2I system: (1) transfer learning, in which acoustic embeddings for intent classification are tied to fine-tuned BERT text embeddings; and (2) data augmentation, in which the text-to-intent data is converted into speech-to-intent data using a multi-speaker text-to-speech system.
no code implementations • 30 Sep 2020 • Hong-Kwang J. Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis Lastras
For our speech-to-entities experiments on the ATIS corpus, both the CTC and attention models showed impressive ability to skip non-entity words: there was little degradation when trained on just entities versus full transcripts.
no code implementations • 2 May 2019 • Zvi Kons, Slava Shechtman, Alex Sorin, Carmel Rabinovitz, Ron Hoory
We first demonstrate the ability of the system to produce high quality speech when trained on large, high quality datasets.
Audio and Speech Processing Sound