no code implementations • IWSLT 2016 • Michaeel Kazi, Elizabeth Salesky, Brian Thompson, Jonathan Taylor, Jeremy Gwinnup, Timothy Anderson, Grant Erdmann, Eric Hansen, Brian Ore, Katherine Young, Michael Hutt
This report summarizes the MITLL-AFRL MT and ASR systems and the experiments run during the 2016 IWSLT evaluation campaign.
1 code implementation • 28 Feb 2024 • Vilém Zouhar, Shuoyang Ding, Anna Currey, Tatyana Badeka, Jenyuan Wang, Brian Thompson
We introduce a new, extensive multidimensional quality metrics (MQM) annotated dataset covering 11 language pairs in the biomedical domain.
1 code implementation • 11 Jan 2024 • Brian Thompson, Mehak Preet Dhaliwal, Peter Frisch, Tobias Domhan, Marcello Federico
We show that content on the web is often translated into many languages, and the low quality of these multi-way translations indicates they were likely created using Machine Translation (MT).
1 code implementation • 1 Nov 2023 • Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico
Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers.
no code implementations • 4 Aug 2023 • Yogesh Virkar, Brian Thompson, Rohit Paturi, Sundararajan Srinivasan, Marcello Federico
The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language.
no code implementations • 22 May 2023 • Proyag Pal, Brian Thompson, Yogesh Virkar, Prashant Mathur, Alexandra Chronopoulou, Marcello Federico
To translate speech for automatic dubbing, machine translation needs to be isochronous, i. e. translated speech needs to be aligned with the source in terms of speech durations.
1 code implementation • 25 Feb 2023 • Alexandra Chronopoulou, Brian Thompson, Prashant Mathur, Yogesh Virkar, Surafel M. Lakew, Marcello Federico
Automatic dubbing (AD) is the task of translating the original speech in a video into target language speech.
1 code implementation • 23 Dec 2022 • William Brannon, Yogesh Virkar, Brian Thompson
We investigate how humans perform the task of dubbing video content from one language into another, leveraging a novel corpus of 319. 57 hours of video from 54 professionally produced titles.
no code implementations • 11 Oct 2022 • Cuong Hoang, Devendra Sachan, Prashant Mathur, Brian Thompson, Marcello Federico
Several recent studies have reported dramatic performance improvements in neural machine translation (NMT) by augmenting translation at inference time with fuzzy-matches retrieved from a translation memory (TM).
no code implementations • 10 Oct 2022 • Cuong Hoang, Devendra Sachan, Prashant Mathur, Brian Thompson, Marcello Federico
We explore zero-shot adaptation, where a general-domain model has access to customer or domain specific parallel data at inference time, but not during training.
1 code implementation • 27 Sep 2022 • Giorgos Vernikos, Brian Thompson, Prashant Mathur, Marcello Federico
Our experimental results support our initial hypothesis and show that a simple extension of the metrics permits them to take advantage of context to resolve ambiguities in the reference.
no code implementations • IWSLT (ACL) 2022 • Brian Thompson, Ali Alshehri
We propose a novel multitask learning method for diacritization which trains a model to both diacritize and translate.
1 code implementation • WMT (EMNLP) 2020 • Brian Thompson, Matt Post
Recent work has shown that a multilingual neural machine translation (NMT) model can be used to judge how well a sentence paraphrases another sentence in the same language (Thompson and Post, 2020); however, attempting to generate paraphrases from such a model using standard beam search produces trivial copies or near copies.
2 code implementations • ACL 2020 • Marta Ba{\~n}{\'o}n, Pin-zhen Chen, Barry Haddow, Kenneth Heafield, Hieu Hoang, Miquel Espl{\`a}-Gomis, Mikel L. Forcada, Amir Kamran, Faheem Kirefu, Philipp Koehn, Sergio Ortiz Rojas, Leopoldo Pla Sempere, Gema Ram{\'\i}rez-S{\'a}nchez, Elsa Sarr{\'\i}as, Marek Strelec, Brian Thompson, William Waites, Dion Wiggins, Jaume Zaragoza
We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software.
no code implementations • LREC 2020 • Kevin Duh, Paul McNamee, Matt Post, Brian Thompson
In this study, we benchmark state of the art statistical and neural machine translation systems on two African languages which do not have large amounts of resources: Somali and Swahili.
1 code implementation • EMNLP 2020 • Brian Thompson, Matt Post
We frame the task of machine translation evaluation as one of scoring machine translation output with a sequence-to-sequence paraphraser, conditioned on a human reference.
1 code implementation • EMNLP 2020 • Brian Thompson, Philipp Koehn
We present a simple document alignment method that incorporates sentence order information in both candidate generation and candidate re-scoring.
1 code implementation • EMNLP 2020 • Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn
Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings.
no code implementations • IJCNLP 2019 • Brian Thompson, Rebecca Knowles, Xuan Zhang, Huda Khayrallah, Kevin Duh, Philipp Koehn
Bilingual lexicons are valuable resources used by professional human translators.
no code implementations • IJCNLP 2019 • Brian Thompson, Philipp Koehn
It substantially outperforms the popular Hunalign toolkit at recovering Bible verse alignments in medium- to low-resource language pairs, and it improves downstream MT quality by 1. 7 and 1. 6 BLEU in Sinhala-English and Nepali-English, respectively, compared to the Hunalign-based Paracrawl pipeline.
no code implementations • NAACL 2019 • Brian Thompson, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, Philipp Koehn
Continued training is an effective method for domain adaptation in neural machine translation.
no code implementations • WS 2018 • Philipp Koehn, Kevin Duh, Brian Thompson
We report on the efforts of the Johns Hopkins University to develop neural machine translation systems for the shared task for news translation organized around the Conference for Machine Translation (WMT) 2018.
1 code implementation • WS 2018 • Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, Philipp Koehn
To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation.
1 code implementation • WS 2018 • Huda Khayrallah, Brian Thompson, Kevin Duh, Philipp Koehn
Supervised domain adaptation{---}where a large generic corpus and a smaller in-domain corpus are both available for training{---}is a challenge for neural machine translation (NMT).
no code implementations • ACL 2017 • Michaeel Kazi, Brian Thompson
In this work, we propose a novel, implicitly-defined neural network architecture and describe a method to compute its components.