Search Results for author: Erik Marchi

Found 17 papers, 0 papers with code

Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models

no code implementations6 Dec 2023 Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi

We compare the proposed system to unimodal baselines and show that the multimodal approach achieves lower equal-error-rates (EERs), while using only a fraction of the training data.

Automatic Speech Recognition Language Modelling +3

Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR

no code implementations21 Oct 2022 Pranay Dighe, Prateeth Nayak, Oggi Rudovic, Erik Marchi, Xiaochuan Niu, Ahmed Tewfik

Accurate prediction of the user intent to interact with a voice assistant (VA) on a device (e. g. on the phone) is critical for achieving naturalistic, engaging, and privacy-centric interactions with the VA. To this end, we present a novel approach to predict the user's intent (the user speaking to the device or not) directly from acoustic and textual information encoded at subword tokens which are obtained via an end-to-end ASR model.

intent-classification Intent Classification

CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations

no code implementations8 Feb 2022 Vin Sachidananda, Shao-Yen Tseng, Erik Marchi, Sachin Kajarekar, Panayiotis Georgiou

By aligning audio representations to pretrained language representations and utilizing contrastive information between acoustic inputs, CALM is able to bootstrap audio embedding competitive with existing audio representation models in only a few hours of training time.

Emotion Recognition Natural Language Understanding

Whispered and Lombard Neural Speech Synthesis

no code implementations13 Jan 2021 Qiong Hu, Tobias Bleisch, Petko Petkov, Tuomo Raitio, Erik Marchi, Varun Lakshminarasimhan

2) Although our speaker verification (SV) model is not explicitly trained to discriminate different speaking styles, and no Lombard and whisper voice is used for pre-training this system, the SV model can be used as a style encoder for generating different style embeddings as input for the Tacotron system.

Speaker Verification Speech Synthesis

Progressive Voice Trigger Detection: Accuracy vs Latency

no code implementations29 Oct 2020 Siddharth Sigtia, John Bridle, Hywel Richards, Pascal Clark, Erik Marchi, Vineet Garg

We first demonstrate that by including more audio context after a detected trigger phrase, we can indeed get a more accurate decision.

Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data

no code implementations10 Apr 2020 Soumi Maiti, Erik Marchi, Alistair Conkie

We demonstrate that a bilingual speaker embedding space contains a separate distribution for each language and that a simple transform in speaker space generated by the speaker embedding can be used to control the degree of accent of a synthetic voice in a language.

Translation

Multi-task Learning for Speaker Verification and Voice Trigger Detection

no code implementations26 Jan 2020 Siddharth Sigtia, Erik Marchi, Sachin Kajarekar, Devang Naik, John Bridle

We train the network in a supervised multi-task learning setup, where the speech transcription branch of the network is trained to minimise a phonetic connectionist temporal classification (CTC) loss while the speaker recognition branch of the network is trained to label the input sequence with the correct label for the speaker.

Multi-Task Learning Speaker Recognition +1

Detecting Road Surface Wetness from Audio: A Deep Learning Approach

no code implementations22 Nov 2015 Irman Abdić, Lex Fridman, Erik Marchi, Daniel E. Brown, William Angell, Bryan Reimer, Björn Schuller

We introduce a recurrent neural network architecture for automated road surface wetness detection from audio of tire-surface interaction.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.