1 code implementation • 8 Aug 2022 • Linh The Nguyen, Nguyen Luong Tran, Long Doan, Manh Luong, Dat Quoc Nguyen
In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence).
1 code implementation • EMNLP 2021 • Long Doan, Linh The Nguyen, Nguyen Luong Tran, Thai Hoang, Dat Quoc Nguyen
We introduce a high-quality and large-scale Vietnamese-English parallel dataset of 3. 02M sentence pairs, which is 2. 9M pairs larger than the benchmark Vietnamese-English machine translation corpus IWSLT15.
1 code implementation • EMNLP (WNUT) 2020 • Dat Quoc Nguyen, Thanh Vu, Afshin Rahimi, Mai Hoang Dao, Linh The Nguyen, Long Doan
In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets.