Machine Translation
2154 papers with code • 80 benchmarks • 77 datasets
Machine translation is the task of translating a sentence in a source language to a different target language.
Approaches for machine translation can range from rule-based to statistical to neural-based. More recently, encoder-decoder attention-based architectures like BERT have attained major improvements in machine translation.
One of the most popular datasets used to benchmark machine translation systems is the WMT family of datasets. Some of the most commonly used evaluation metrics for machine translation systems include BLEU, METEOR, NIST, and others.
( Image credit: Google seq2seq )
Libraries
Use these libraries to find Machine Translation models and implementationsSubtasks
Latest papers
Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata
The Directed Acyclic Transformer is a fast non-autoregressive (NAR) model that performs well in Neural Machine Translation.
SLPL SHROOM at SemEval2024 Task 06: A comprehensive study on models ability to detect hallucination
Language models, particularly generative models, are susceptible to hallucinations, generating outputs that contradict factual knowledge or the source text.
F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation
In the evolving landscape of Neural Machine Translation (NMT), the pretrain-then-finetune paradigm has yielded impressive results.
Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the Mambai Language
Leveraging a novel corpus derived from a Mambai language manual and additional sentences translated by a native speaker, we examine the efficacy of few-shot LLM prompting for machine translation (MT) in this low-resource context.
KazQAD: Kazakh Open-Domain Question Answering Dataset
We introduce KazQAD -- a Kazakh open-domain question answering (ODQA) dataset -- that can be used in both reading comprehension and full ODQA settings, as well as for information retrieval experiments.
Large Language Models for Expansion of Spoken Language Understanding Systems to New Languages
In the on-device scenario (tiny and not pretrained SLU), our method improved the Overall Accuracy from 5. 31% to 22. 06% over the baseline Global-Local Contrastive Learning Framework (GL-CLeF) method.
Low-resource neural machine translation with morphological modeling
An attention augmentation scheme to the transformer model is proposed in a generic form to allow integration of pre-trained language models and also facilitate modeling of word order relationships between the source and target languages.
An image speaks a thousand words, but can everyone listen? On translating images for cultural relevance
First, we build three pipelines comprising state-of-the-art generative models to do the task.
AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness
This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages.
KazParC: Kazakh Parallel Corpus for Machine Translation
We introduce KazParC, a parallel corpus designed for machine translation across Kazakh, English, Russian, and Turkish.