Search Results for author: Karen Livescu

Found 80 papers, 32 papers with code

Self-supervised Representation Learning for Speech Processing

1 code implementation NAACL (ACL) 2022 Hung-Yi Lee, Abdelrahman Mohamed, Shinji Watanabe, Tara Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, Katrin Kirchhoff

Due to the growing popularity of SSL, and the shared mission of the areas in bringing speech and language technologies to more use cases with better quality and scaling the technologies for under-represented languages, we propose this tutorial to systematically survey the latest SSL techniques, tools, datasets, and performance achievement in speech processing.

Representation Learning

Structured Tree Alignment for Evaluation of (Speech) Constituency Parsing

1 code implementation21 Feb 2024 Freda Shi, Kevin Gimpel, Karen Livescu

We present the structured average intersection-over-union ratio (STRUCT-IOU), a similarity metric between constituency parse trees motivated by the problem of evaluating speech parsers.

Constituency Parsing

Generative Context-aware Fine-tuning of Self-supervised Speech Models

no code implementations15 Dec 2023 Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu

Considering the recent advances in generative large language models (LLM), we hypothesize that an LLM could generate useful context information using the preceding text.

Automatic Speech Recognition named-entity-recognition +6

Few-Shot Spoken Language Understanding via Joint Speech-Text Models

no code implementations9 Oct 2023 Chung-Ming Chien, Mingjiamei Zhang, Ju-chieh Chou, Karen Livescu

Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space.

named-entity-recognition Named Entity Recognition +2

AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

no code implementations14 Sep 2023 Ju-chieh Chou, Chung-Ming Chien, Karen Livescu

In this work, we introduce AV2Wav, a resynthesis-based audio-visual speech enhancement approach that can generate clean speech despite the challenges of real-world training data.

Resynthesis Speech Enhancement

What Do Self-Supervised Speech Models Know About Words?

1 code implementation30 Jun 2023 Ankita Pasad, Chung-Ming Chien, Shane Settle, Karen Livescu

Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks.

Sentence Sentence Similarity +1

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

no code implementations20 Dec 2022 Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe

In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.

Dialog Act Classification Question Answering +4

Context-aware Fine-tuning of Self-supervised Speech Models

no code implementations16 Dec 2022 Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe

During the fine-tuning stage, we introduce an auxiliary loss that encourages this context embedding vector to be similar to context vectors of surrounding segments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Comparative layer-wise analysis of self-supervised speech models

1 code implementation8 Nov 2022 Ankita Pasad, Bowen Shi, Karen Livescu

We further investigate the utility of our analyses for downstream tasks by comparing the property trends with performance on speech recognition and spoken language understanding tasks.

speech-recognition Speech Recognition +1

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

3 code implementations9 Jun 2022 Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

Open-Domain Sign Language Translation Learned from Online Video

1 code implementation25 May 2022 Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu

Existing work on sign language translation - that is, translation from sign language videos into sentences in a written language - has focused mainly on (1) data collected in a controlled environment or (2) data in a specific domain, which limits the applicability to real-world settings.

Sign Language Translation Translation

Self-Supervised Speech Representation Learning: A Review

no code implementations21 May 2022 Abdelrahman Mohamed, Hung-Yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe

Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Searching for fingerspelled content in American Sign Language

no code implementations ACL 2022 Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu

This is an important task since significant content in sign language is often conveyed via fingerspelling, and to our knowledge the task has not been studied before.

Retrieval Translation

On the Use of External Data for Spoken Named Entity Recognition

1 code implementation NAACL 2022 Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han

In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task?

Knowledge Distillation named-entity-recognition +6

Substructure Distribution Projection for Zero-Shot Cross-Lingual Dependency Parsing

no code implementations ACL 2022 Haoyue Shi, Kevin Gimpel, Karen Livescu

We present substructure distribution projection (SubDP), a technique that projects a distribution over structures in one domain to another, by projecting substructure distributions separately.

Dependency Parsing

On Generalization in Coreference Resolution

2 code implementations CRAC (ACL) 2021 Shubham Toshniwal, Patrick Xia, Sam Wiseman, Karen Livescu, Kevin Gimpel

While coreference resolution is defined independently of dataset domain, most models for performing coreference resolution do not transfer well to unseen domains.

coreference-resolution Data Augmentation

Layer-wise Analysis of a Self-supervised Speech Representation Model

1 code implementation10 Jul 2021 Ankita Pasad, Ju-chieh Chou, Karen Livescu

Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Fingerspelling Detection in American Sign Language

1 code implementation CVPR 2021 Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu

We propose a benchmark and a suite of evaluation metrics, some of which reflect the effect of detection on the downstream fingerspelling recognition task.

Pose Estimation

Substructure Substitution: Structured Data Augmentation for NLP

no code implementations Findings (ACL) 2021 Haoyue Shi, Karen Livescu, Kevin Gimpel

We study a family of data augmentation methods, substructure substitution (SUB2), for natural language processing (NLP) tasks.

Data Augmentation Part-Of-Speech Tagging +2

Learning Chess Blindfolded

no code implementations1 Jan 2021 Shubham Toshniwal, Sam Wiseman, Karen Livescu, Kevin Gimpel

Motivated by this issue, we consider the task of language modeling for the game of chess.

Domain Probing Game of Chess +2

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings

no code implementations3 Dec 2020 Puyuan Peng, Herman Kamper, Karen Livescu

We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation.

Word Embeddings

Acoustic span embeddings for multilingual query-by-example search

1 code implementation24 Nov 2020 Yushi Hu, Shane Settle, Karen Livescu

In this work, we generalize AWE training to spans of words, producing acoustic span embeddings (ASE), and explore the application of ASE to QbE with arbitrary-length queries in multiple unseen languages.

Dynamic Time Warping Word Embeddings

On the Role of Supervision in Unsupervised Constituency Parsing

no code implementations EMNLP 2020 Haoyue Shi, Karen Livescu, Kevin Gimpel

We analyze several recent unsupervised constituency parsing models, which are tuned with respect to the parsing $F_1$ score on the Wall Street Journal (WSJ) development set (1, 700 sentences).

Constituency Parsing Data Augmentation +1

Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings

1 code implementation1 Jul 2020 Bowen Shi, Shane Settle, Karen Livescu

We find that word error rate can be reduced by a large margin by pre-training the acoustic segment representation with AWEs, and additional (smaller) gains can be obtained by pre-training the word prediction layer with AGWEs.

speech-recognition Speech Recognition +1

Multilingual Jointly Trained Acoustic and Written Word Embeddings

1 code implementation24 Jun 2020 Yushi Hu, Shane Settle, Karen Livescu

The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.

Dynamic Time Warping Retrieval +1

Discrete Latent Variable Representations for Low-Resource Text Classification

1 code implementation ACL 2020 Shuning Jin, Sam Wiseman, Karl Stratos, Karen Livescu

While much work on deep latent variable models of text uses continuous latent variables, discrete latent variables are interesting because they are more interpretable and typically more space efficient.

General Classification Sentence +2

A Cross-Task Analysis of Text Span Representations

1 code implementation WS 2020 Shubham Toshniwal, Haoyue Shi, Bowen Shi, Lingyu Gao, Karen Livescu, Kevin Gimpel

Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution.

coreference-resolution Question Answering

Fingerspelling recognition in the wild with iterative visual attention

2 code implementations ICCV 2019 Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Diane Brentari, Greg Shakhnarovich, Karen Livescu

In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media.

Hand Detection Segmentation +1

Visually Grounded Neural Syntax Acquisition

no code implementations ACL 2019 Haoyue Shi, Jiayuan Mao, Kevin Gimpel, Karen Livescu

We define concreteness of constituents by their matching scores with images, and use it to guide the parsing of text.

Visual Grounding

Variational recurrent models for representation learning

no code implementations ICLR 2019 Qingming Tang, Mingda Chen, Weiran Wang, Karen Livescu

Existing variational recurrent models typically use stochastic recurrent connections to model the dependence among neighboring latent variables, while generation assumes independence of generated data per time step given the latent sequence.

MULTI-VIEW LEARNING Representation Learning

On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval

no code implementations24 Apr 2019 Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu

Recent work has shown that speech paired with images can be used to learn semantically meaningful speech representations even without any textual supervision.

Retrieval Visual Grounding

Semantic query-by-example speech search using visual grounding

1 code implementation15 Apr 2019 Herman Kamper, Aristotelis Anastassiou, Karen Livescu

A number of recent studies have started to investigate how speech systems can be trained on untranscribed speech by leveraging accompanying images at training time.

Retrieval Semantic Retrieval +1

Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition

no code implementations29 Mar 2019 Shane Settle, Kartik Audhkhasi, Karen Livescu, Michael Picheny

Direct acoustics-to-word (A2W) systems for end-to-end automatic speech recognition are simpler to train, and more efficient to decode with, than sub-word systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

American Sign Language fingerspelling recognition in the wild

no code implementations26 Oct 2018 Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Jonathan Michaux, Diane Brentari, Greg Shakhnarovich, Karen Livescu

As the first attempt at fingerspelling recognition in the wild, this work is intended to serve as a baseline for future work on sign language recognition in realistic conditions.

Sign Language Recognition

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

1 code implementation NAACL 2019 Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater

Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3. 5 to 7. 1

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Hierarchical Multitask Learning for CTC-based Speech Recognition

no code implementations17 Jul 2018 Kalpesh Krishna, Shubham Toshniwal, Karen Livescu

Previous work has shown that neural encoder-decoder speech recognition can be improved with hierarchical multitask learning, where auxiliary tasks are added at intermediate layers of a deep encoder.

speech-recognition Speech Recognition

Low-Resource Speech-to-Text Translation

no code implementations24 Mar 2018 Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater

We explore models trained on between 20 and 160 hours of data, and find that although models trained on less data have considerably lower BLEU scores, they can still predict words with relatively high precision and recall---around 50% for a model trained on 50 hours of data, versus around 60% for the full 160 hour model.

Machine Translation speech-recognition +3

Acoustic feature learning using cross-domain articulatory measurements

no code implementations19 Mar 2018 Qingming Tang, Weiran Wang, Karen Livescu

Previous work has shown that it is possible to improve speech recognition by learning acoustic features from paired acoustic-articulatory data, for example by using canonical correlation analysis (CCA) or its deep extensions.

speech-recognition Speech Recognition

Multitask training with unlabeled data for end-to-end sign language fingerspelling recognition

no code implementations9 Oct 2017 Bowen Shi, Karen Livescu

We introduce a model for fingerspelling recognition that addresses these issues.

Semantic speech retrieval with a visually grounded model of untranscribed speech

2 code implementations5 Oct 2017 Herman Kamper, Gregory Shakhnarovich, Karen Livescu

We introduce a newly collected data set of human semantic relevance judgements and an associated task, semantic speech retrieval, where the goal is to search for spoken utterances that are semantically relevant to a given text query.

Language Acquisition Retrieval

Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis

no code implementations11 Aug 2017 Qingming Tang, Weiran Wang, Karen Livescu

We study the problem of acoustic feature learning in the setting where we have access to another (non-acoustic) modality for feature learning but not at test time.

Representation Learning

End-to-End Neural Segmental Models for Speech Recognition

no code implementations1 Aug 2017 Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals

Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.

speech-recognition Speech Recognition

Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings

1 code implementation12 Jun 2017 Shane Settle, Keith Levin, Herman Kamper, Karen Livescu

Query-by-example search often uses dynamic time warping (DTW) for comparing queries and proposed matching segments.

Dynamic Time Warping Word Embeddings

Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition

no code implementations5 Apr 2017 Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu

We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of combining the advantages of end-to-end training and more traditional pipeline approaches.

speech-recognition Speech Recognition

An embedded segmental K-means model for unsupervised segmentation and clustering of speech

2 code implementations23 Mar 2017 Herman Kamper, Karen Livescu, Sharon Goldwater

Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing.

Bayesian Inference Clustering +2

Visually grounded learning of keyword prediction from untranscribed speech

1 code implementation23 Mar 2017 Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu

In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech.

Language Acquisition TAG

Multi-view Recurrent Neural Acoustic Word Embeddings

no code implementations14 Nov 2016 Wanjia He, Weiran Wang, Karen Livescu

Recent work has begun exploring neural acoustic word embeddings---fixed-dimensional vector representations of arbitrary-length speech segments corresponding to words.

Retrieval Word Embeddings +1

Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches

no code implementations8 Nov 2016 Shane Settle, Karen Livescu

Acoustic word embeddings --- fixed-dimensional vector representations of variable-length spoken word segments --- have begun to be considered for tasks such as speech recognition and query-by-example search.

Dynamic Time Warping General Classification +3

End-to-End Training Approaches for Discriminative Segmental Models

no code implementations21 Oct 2016 Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

Similarly to hybrid HMM-neural network models, segmental models of this class can be trained in two stages (frame classifier training followed by linear segmental model weight training), end to end (joint training of both frame classifier and linear weights), or with end-to-end fine-tuning after two-stage training.

speech-recognition Speech Recognition

Jointly Learning to Align and Convert Graphemes to Phonemes with Neural Attention Models

1 code implementation20 Oct 2016 Shubham Toshniwal, Karen Livescu

We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion.

Deep Variational Canonical Correlation Analysis

no code implementations11 Oct 2016 Weiran Wang, Xinchen Yan, Honglak Lee, Karen Livescu

We present deep variational canonical correlation analysis (VCCA), a deep multi-view learning model that extends the latent variable model interpretation of linear CCA to nonlinear observation models parameterized by deep neural networks.

MULTI-VIEW LEARNING

Lexicon-Free Fingerspelling Recognition from Video: Data, Models, and Signer Adaptation

no code implementations26 Sep 2016 Taehwan Kim, Jonathan Keane, Weiran Wang, Hao Tang, Jason Riggle, Gregory Shakhnarovich, Diane Brentari, Karen Livescu

Recognizing fingerspelling is challenging for a number of reasons: It involves quick, small motions that are often highly coarticulated; it exhibits significant variation between signers; and there has been a dearth of continuous fingerspelling data collected.

Efficient Segmental Cascades for Speech Recognition

no code implementations2 Aug 2016 Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition.

speech-recognition Speech Recognition

Charagram: Embedding Words and Sentences via Character n-grams

no code implementations EMNLP 2016 John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu

We present Charagram embeddings, a simple approach for learning character-based compositional models to embed textual sequences.

Part-Of-Speech Tagging Sentence +2

Signer-independent Fingerspelling Recognition with Deep Neural Network Adaptation

no code implementations13 Feb 2016 Taehwan Kim, Weiran Wang, Hao Tang, Karen Livescu

Previous work has shown that it is possible to achieve almost 90% accuracies on fingerspelling recognition in a signer-dependent setting.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

On Deep Multi-View Representation Learning: Objectives and Optimization

1 code implementation2 Feb 2016 Weiran Wang, Raman Arora, Karen Livescu, Jeff Bilmes

We consider learning representations (features) in the setting in which we have access to multiple unlabeled views of the data for learning while only one view is available for downstream tasks.

Representation Learning Stochastic Optimization

Towards Universal Paraphrastic Sentence Embeddings

no code implementations25 Nov 2015 John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu

We again find that the word averaging models perform well for sentence similarity and entailment, outperforming LSTMs.

General Classification Sentence +4

Nonparametric Canonical Correlation Analysis

no code implementations16 Nov 2015 Tomer Michaeli, Weiran Wang, Karen Livescu

Several nonlinear extensions of the original linear CCA have been proposed, including kernel and deep neural network methods.

Representation Learning

Large-Scale Approximate Kernel Canonical Correlation Analysis

no code implementations15 Nov 2015 Weiran Wang, Karen Livescu

Kernel canonical correlation analysis (KCCA) is a nonlinear multi-view representation learning technique with broad applicability in statistics and machine learning.

Representation Learning Stochastic Optimization

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

no code implementations7 Oct 2015 Weiran Wang, Raman Arora, Karen Livescu, Nathan Srebro

Deep CCA is a recently proposed deep neural network extension to the traditional canonical correlation analysis (CCA), and has been successful for multi-view representation learning in several domains.

Representation Learning Stochastic Optimization

Deep convolutional acoustic word embeddings using word-pair side information

1 code implementation5 Oct 2015 Herman Kamper, Weiran Wang, Karen Livescu

Recent studies have been revisiting whole words as the basic modelling unit in speech recognition and query applications, instead of phonetic units.

speech-recognition Speech Recognition +1

Discriminative Segmental Cascades for Feature-Rich Phone Recognition

no code implementations22 Jul 2015 Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

A typical solution is to use approximate decoding, either by beam pruning in a single pass or by beam pruning to generate a lattice followed by a second pass.

Language Modelling speech-recognition +2

From Paraphrase Database to Compositional Paraphrase Model and Back

1 code implementation TACL 2015 John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu, Dan Roth

The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive semantic resource, consisting of a list of phrase pairs with (heuristic) confidence estimates.

Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.