Search Results for author: Steven Bird

Found 37 papers, 5 papers with code

Learning From Failure: Data Capture in an Australian Aboriginal Community

no code implementations ACL 2022 Eric Le Ferrand, Steven Bird, Laurent Besacier

Most low resource language technology development is premised on the need to collect data for training statistical models.

A Computational Model for Interactive Transcription

no code implementations NAACL (DaSH) 2021 William Lane, Mat Bettinson, Steven Bird

Transcribing low resource languages can be challenging in the absence of a good lexicon and trained transcribers.

Retrieval

Local Word Discovery for Interactive Transcription

no code implementations EMNLP 2021 William Lane, Steven Bird

Human expertise and the participation of speech communities are essential factors in the success of technologies for low-resource languages.

A Finite State Aproach to Interactive Transcription

no code implementations FieldMatters (COLING) 2022 William Lane, Steven Bird

We describe a novel approach to transcribing morphologically complex, local, oral languages.

Learning Through Transcription

no code implementations ComputEL (ACL) 2022 Mat Bettinson, Steven Bird

Transcribing speech for primarily oral, local languages is often a joint effort involving speakers and outsiders.

Language Acquisition

Phone Based Keyword Spotting for Transcribing Very Low Resource Languages

no code implementations ALTA 2021 Eric Le Ferrand, Steven Bird, Laurent Besacier

We investigate the efficiency of two very different spoken term detection approaches for transcription when the available data is insufficient to train a robust speech recognition system.

Dynamic Time Warping Keyword Spotting +2

Sparse Transcription

no code implementations CL (ACL) 2020 Steven Bird

The transcription bottleneck is often cited as a major obstacle for efforts to document the world’s endangered languages and supply them with language technologies.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings

no code implementations11 Jun 2021 Éric Le Ferrand, Steven Bird, Laurent Besacier

We investigate the efficiency of two very different spoken term detection approaches for transcription when the available data is insufficient to train a robust ASR system.

Dynamic Time Warping

Interactive Word Completion for Morphologically Complex Languages

no code implementations COLING 2020 William Lane, Steven Bird

We show that the space of proximal morph completions is many orders of magnitude smaller than the space of full word completions for Kunwinjku.

MORPH

Decolonising Speech and Language Technology

no code implementations COLING 2020 Steven Bird

After generations of exploitation, Indigenous people often respond negatively to the idea that their languages are data ready for the taking.

Enabling Interactive Transcription in an Indigenous Community

no code implementations COLING 2020 Éric Le Ferrand, Steven Bird, Laurent Besacier

We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop, together with a pilot experiment.

Bootstrapping Techniques for Polysynthetic Morphological Analysis

no code implementations ACL 2020 William Lane, Steven Bird

To address this challenge, we offer linguistically-informed approaches for bootstrapping a neural morphological analyzer, and demonstrate its application to Kunwinjku, a polysynthetic Australian language.

MORPH Morphological Analysis

Towards A Robust Morphological Analyzer for Kunwinjku

no code implementations ALTA 2019 William Lane, Steven Bird

Kunwinjku is an indigenous Australian language spoken in northern Australia which exhibits agglutinative and polysynthetic properties.

Cross-Lingual Word Embeddings for Low-Resource Language Modeling

no code implementations EACL 2017 Oliver Adams, Adam Makarucha, Graham Neubig, Steven Bird, Trevor Cohn

We investigate the use of such lexicons to improve language models when textual training data is limited to as few as a thousand sentences.

Cross-Lingual Word Embeddings Language Modelling +3

Multilingual Training of Crosslingual Word Embeddings

no code implementations EACL 2017 Long Duong, Hiroshi Kanayama, Tengfei Ma, Steven Bird, Trevor Cohn

Crosslingual word embeddings represent lexical items from different languages using the same vector space, enabling crosslingual transfer.

Bilingual Lexicon Induction Dependency Parsing +6

NLTK: The Natural Language Toolkit

1 code implementation17 May 2002 Edward Loper, Steven Bird

NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware.

Multi-Label Text Classification

Annotation graphs as a framework for multidimensional linguistic data analysis

1 code implementation5 Jul 1999 Steven Bird, Mark Liberman

In recent work we have presented a formal framework for linguistic annotation based on labeled acyclic digraphs.

Cannot find the paper you are looking for? You can Submit a new open access paper.