no code implementations • 28 Aug 2023 • Shane Settle
As an alternative, acoustic word embeddings -- fixed-dimensional vector representations of variable-length spoken word segments -- have begun to be considered for such tasks as well.
1 code implementation • 30 Jun 2023 • Ankita Pasad, Chung-Ming Chien, Shane Settle, Karen Livescu
Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks.
1 code implementation • 24 Nov 2020 • Yushi Hu, Shane Settle, Karen Livescu
In this work, we generalize AWE training to spans of words, producing acoustic span embeddings (ASE), and explore the application of ASE to QbE with arbitrary-length queries in multiple unseen languages.
1 code implementation • 1 Jul 2020 • Bowen Shi, Shane Settle, Karen Livescu
We find that word error rate can be reduced by a large margin by pre-training the acoustic segment representation with AWEs, and additional (smaller) gains can be obtained by pre-training the word prediction layer with AGWEs.
1 code implementation • 24 Jun 2020 • Yushi Hu, Shane Settle, Karen Livescu
The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.
no code implementations • 29 Mar 2019 • Shane Settle, Kartik Audhkhasi, Karen Livescu, Michael Picheny
Direct acoustics-to-word (A2W) systems for end-to-end automatic speech recognition are simpler to train, and more efficient to decode with, than sub-word systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 12 Jun 2017 • Shane Settle, Keith Levin, Herman Kamper, Karen Livescu
Query-by-example search often uses dynamic time warping (DTW) for comparing queries and proposed matching segments.
1 code implementation • 23 Mar 2017 • Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu
In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech.
no code implementations • 8 Nov 2016 • Shane Settle, Karen Livescu
Acoustic word embeddings --- fixed-dimensional vector representations of variable-length spoken word segments --- have begun to be considered for tasks such as speech recognition and query-by-example search.