no code implementations • 23 Feb 2024 • Jintao Jiang, Yingbo Gao, Mohammad Zeineldeen, Zoltan Tuske
In this paper, alternating weak triphone/BPE alignment supervision is proposed to improve end-to-end model training.
1 code implementation • 17 Jan 2024 • David Thulke, Yingbo Gao, Petrus Pelser, Rein Brune, Rricha Jalota, Floris Fok, Michael Ramos, Ian van Wyk, Abdallah Nasir, Hayden Goldstein, Taylor Tragemann, Katie Nguyen, Ariana Fowler, Andrew Stanco, Jon Gabriel, Jordan Taylor, Dean Moro, Evgenii Tsymbalov, Juliette de Waal, Evgeny Matusov, Mudar Yaghi, Mohammad Shihadah, Hermann Ney, Christian Dugast, Jonathan Dotan, Daniel Erasmus
To increase the accessibility of our model to non-English speakers, we propose to make use of cascaded machine translation and show that this approach can perform comparably to natively multilingual models while being easier to scale to a large number of languages.
no code implementations • 24 Nov 2023 • Jintao Jiang, Yingbo Gao, Zoltan Tuske
In contrast to the general one-hot cross-entropy losses, here we use a cross-entropy loss with a label smoothing parameter to regularize the supervision.
no code implementations • 8 Jun 2023 • Christian Herold, Yingbo Gao, Mohammad Zeineldeen, Hermann Ney
The integration of language models for neural machine translation has been extensively studied in the past.
1 code implementation • 24 Oct 2022 • Viet Anh Khoa Tran, David Thulke, Yingbo Gao, Christian Herold, Hermann Ney
Currently, in speech translation, the straightforward approach - cascading a recognition system with a translation system - delivers state-of-the-art results.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 21 Oct 2022 • Yingbo Gao, Christian Herold, Zijian Yang, Hermann Ney
Checkpoint averaging is a simple and effective method to boost the performance of converged neural machine translation models.
no code implementations • 21 Oct 2022 • Yingbo Gao, Christian Herold, Zijian Yang, Hermann Ney
Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks.
no code implementations • 11 Nov 2021 • Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf Schlüter, Hermann Ney
Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • ACL 2021 • Weiyue Wang, Zijian Yang, Yingbo Gao, Hermann Ney
The neural hidden Markov model has been proposed as an alternative to attention mechanism in machine translation with recurrent neural networks.
no code implementations • 21 Apr 2021 • Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney
As the vocabulary size of modern word-based language models becomes ever larger, many sampling-based training criteria are proposed and investigated.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Zijian Yang, Yingbo Gao, Weiyue Wang, Hermann Ney
Attention-based encoder-decoder models have achieved great success in neural machine translation tasks.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Yingbo Gao, Weiyue Wang, Christian Herold, Zijian Yang, Hermann Ney
In order to combat overfitting and in pursuit of better generalization, label smoothing is widely applied in modern neural machine translation systems.
no code implementations • COLING 2020 • Yingbo Gao, Baohao Liao, Hermann Ney
Soft contextualized data augmentation is a recent method that replaces one-hot representation of words with soft posterior distributions of an external language model, smoothing the input of neural machine translation systems.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Baohao Liao, Yingbo Gao, Hermann Ney
Mutual learning, where multiple agents learn collaboratively and teach one another, has been shown to be an effective way to distill knowledge for image classification tasks.
no code implementations • WMT (EMNLP) 2020 • Jingjing Huo, Christian Herold, Yingbo Gao, Leonard Dahlmann, Shahram Khadivi, Hermann Ney
Context-aware neural machine translation (NMT) is a promising direction to improve the translation quality by making use of the additional context, e. g., document-level translation, or having meta-information.
no code implementations • 20 May 2020 • Jingjing Huo, Yingbo Gao, Weiyue Wang, Ralf Schlüter, Hermann Ney
After that, we apply the best norm-scaling setup in combination with various margins and conduct neural language models rescoring experiments in automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • EMNLP (IWSLT) 2019 • Yingbo Gao, Christian Herold, Weiyue Wang, Hermann Ney
Prominently used in support vector machines and logistic regressions, kernel functions (kernels) can implicitly map data points into high dimensional spaces and make it easier to learn complex decision boundaries.
1 code implementation • IJCNLP 2019 • Yingbo Gao, Weiyue Wang, Hermann Ney
The preprocessing pipelines in Natural Language Processing usually involve a step of removing sentences consisted of illegal characters.
no code implementations • WS 2019 • Jan Rosendahl, Christian Herold, Yunsu Kim, Miguel Gra{\c{c}}a, Weiyue Wang, Parnia Bahar, Yingbo Gao, Hermann Ney
For the De-En task, none of the tested methods gave a significant improvement over last years winning system and we end up with the same performance, resulting in 39. 6{\%} BLEU on newstest2019.
1 code implementation • ACL 2019 • Yunsu Kim, Yingbo Gao, Hermann Ney
Transfer learning or multilingual model is essential for low-resource neural machine translation (NMT), but the applicability is limited to cognate languages by sharing their vocabularies.
Cross-Lingual Transfer Low-Resource Neural Machine Translation +3
no code implementations • WS 2018 • Christian Herold, Yingbo Gao, Hermann Ney
Embedding and projection matrices are commonly used in neural language models (NLM) as well as in other sequence processing networks that operate on large vocabularies.