no code implementations • IWSLT (ACL) 2022 • Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig, Shinji Watanabe
We use additional paired Modern Standard Arabic data (MSA) to directly improve the speech recognition (ASR) and machine translation (MT) components of our cascaded systems.
no code implementations • 13 Mar 2024 • Sweta Agrawal, Amin Farajian, Patrick Fernandes, Ricardo Rei, André F. T. Martins
Our findings show that augmenting neural learned metrics with contextual information helps improve correlation with human judgments in the reference-free scenario and when evaluating translations in out-of-English settings.
1 code implementation • 27 Feb 2024 • Duarte M. Alves, José Pombal, Nuno M. Guerreiro, Pedro H. Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G. C. de Souza, André F. T. Martins
While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task.
1 code implementation • 1 Feb 2024 • Manuel Faysse, Patrick Fernandes, Nuno M. Guerreiro, António Loison, Duarte M. Alves, Caio Corro, Nicolas Boizard, João Alves, Ricardo Rei, Pedro H. Martins, Antoni Bigata Casademunt, François Yvon, André F. T. Martins, Gautier Viaud, Céline Hudelot, Pierre Colombo
We introduce CroissantLLM, a 1. 3B language model pretrained on a set of 3T English and French tokens, to bring to the research and industrial community a high-performance, fully open-sourced bilingual model that runs swiftly on consumer-grade local hardware.
1 code implementation • 20 Nov 2023 • Sumire Honda, Patrick Fernandes, Chrysoula Zerva
We make use of Conditional Cross-Mutual Information (CXMI) to explore how much of the context the model uses and generalise CXMI to study the impact of the extra-sentential context.
no code implementations • 15 Nov 2023 • Miguel Moura Ramos, Patrick Fernandes, António Farinhas, André F. T. Martins
A core ingredient in RLHF's success in aligning and improving large language models (LLMs) is its reward model, trained using human feedback on model outputs.
no code implementations • 14 Aug 2023 • Patrick Fernandes, Daniel Deutsch, Mara Finkelstein, Parker Riley, André F. T. Martins, Graham Neubig, Ankush Garg, Jonathan H. Clark, Markus Freitag, Orhan Firat
Automatic evaluation of machine translation (MT) is a critical tool driving the rapid iterative development of MT systems.
1 code implementation • 1 Jun 2023 • Sameer Jain, Vaishakh Keshava, Swarnashree Mysore Sathyendra, Patrick Fernandes, PengFei Liu, Graham Neubig, Chunting Zhou
Most frameworks that perform such multi-dimensional evaluation require training on large manually or synthetically generated datasets.
1 code implementation • 17 May 2023 • Markus Freitag, Behrooz Ghorbani, Patrick Fernandes
Recent advances in machine translation (MT) have shown that Minimum Bayes Risk (MBR) decoding can be a powerful alternative to beam search decoding, especially when combined with neural-based utility functions.
no code implementations • 1 May 2023 • Patrick Fernandes, Aman Madaan, Emmy Liu, António Farinhas, Pedro Henrique Martins, Amanda Bertsch, José G. C. de Souza, Shuyan Zhou, Tongshuang Wu, Graham Neubig, André F. T. Martins
Many recent advances in natural language generation have been fueled by training large language models on internet-scale data.
1 code implementation • 10 Apr 2023 • Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polák, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe
ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community.
no code implementations • 19 Feb 2023 • Patrick Fernandes, Behrooz Ghorbani, Xavier Garcia, Markus Freitag, Orhan Firat
Through a novel joint scaling law formulation, we compute the effective number of parameters allocated to each language pair and examine the role of language similarity in the scaling behavior of our models.
no code implementations • 13 Oct 2022 • Jimin Sun, Patrick Fernandes, Xinyi Wang, Graham Neubig
Recent work on tokenizer-free multilingual pretrained models show promising results in improving cross-lingual transfer and reducing engineering overhead (Clark et al., 2022; Xue et al., 2022).
1 code implementation • NAACL 2022 • Patrick Fernandes, António Farinhas, Ricardo Rei, José G. C. de Souza, Perez Ogayo, Graham Neubig, André F. T. Martins
Despite the progress in machine translation quality estimation and evaluation in the last years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers around finding the most probable translation according to the model (MAP decoding), approximated with beam search.
1 code implementation • 22 Apr 2022 • Patrick Fernandes, Marcos Treviso, Danish Pruthi, André F. T. Martins, Graham Neubig
In this work, leveraging meta-learning techniques, we extend this idea to improve the quality of the explanations themselves, specifically by optimizing explanations such that student models more effectively learn to simulate the original model.
no code implementations • spnlp (ACL) 2022 • Marcos Treviso, António Góis, Patrick Fernandes, Erick Fonseca, André F. T. Martins
Transformers' quadratic complexity with respect to the input sequence length has motivated a body of work on efficient sparse approximations to softmax.
no code implementations • 15 Sep 2021 • Patrick Fernandes, Kayo Yin, Emmy Liu, André F. T. Martins, Graham Neubig
Although proper handling of discourse significantly contributes to the quality of machine translation (MT), these improvements are not adequately measured in common translation quality metrics.
1 code implementation • ACL 2021 • Kayo Yin, Patrick Fernandes, Danish Pruthi, Aditi Chaudhary, André F. T. Martins, Graham Neubig
Are models paying large amounts of attention to the same context?
1 code implementation • ACL 2021 • Patrick Fernandes, Kayo Yin, Graham Neubig, André F. T. Martins
Recent work in neural machine translation has demonstrated both the necessity and feasibility of using inter-sentential context -- context from sentences other than those currently being translated.
3 code implementations • ICLR 2019 • Patrick Fernandes, Miltiadis Allamanis, Marc Brockschmidt
Summarization of long sequences into a concise statement is a core problem in natural language processing, requiring non-trivial understanding of the input.