Search Results for author: Long Doan

Found 3 papers, 3 papers with code

A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation

1 code implementation • 8 Aug 2022 • Linh The Nguyen, Nguyen Luong Tran, Long Doan, Manh Luong, Dat Quoc Nguyen

In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence).

Sentence Translation

Paper
Code

PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation

1 code implementation • EMNLP 2021 • Long Doan, Linh The Nguyen, Nguyen Luong Tran, Thai Hoang, Dat Quoc Nguyen

We introduce a high-quality and large-scale Vietnamese-English parallel dataset of 3. 02M sentence pairs, which is 2. 9M pairs larger than the benchmark Vietnamese-English machine translation corpus IWSLT15.

Denoising Machine Translation +2

Paper
Code

WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets

1 code implementation • EMNLP (WNUT) 2020 • Dat Quoc Nguyen, Thanh Vu, Afshin Rahimi, Mai Hoang Dao, Linh The Nguyen, Long Doan

In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets.

Task 2 Text Classification

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.