no code implementations • 28 Jul 2022 • Zvi Kons, Hagai Aronowitz, Edmilson Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas, George Saon
We propose using a recurrent neural network transducer (RNN-T)-based speech-to-text (STT) system as a common component that can be used for emotion recognition and language identification as well as for speech recognition.
no code implementations • 1 Mar 2022 • Hagai Aronowitz, Itai Gat, Edmilson Morais, Weizhong Zhu, Ron Hoory
Beyond that, a common engine should be capable of supporting distributed training with client in-house private data.
no code implementations • ICASSP 2022 • Edmilson Morais, Ron Hoory, Weizhong Zhu, Itai Gat, Matheus Damasceno, Hagai Aronowitz
Self-supervised pre-trained features have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of speech emotion recognition (SER) still need further investigation.
no code implementations • 2 Feb 2022 • Itai Gat, Hagai Aronowitz, Weizhong Zhu, Edmilson Morais, Ron Hoory
Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases.
Ranked #1 on Speech Emotion Recognition on IEMOCAP (AUC metric)
1 code implementation • 7 Apr 2021 • Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Kuo, Samuel Thomas, Edmilson Morais
To address the first challenge, we propose a novel system that can predict intents from flexible types of inputs: speech, ASR transcripts, or both.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 16 Nov 2020 • Edmilson Morais, Hong-Kwang J. Kuo, Samuel Thomas, Zoltan Tuske, Brian Kingsbury
Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation.