Machine Translation

2152 papers with code • 80 benchmarks • 77 datasets

Machine translation is the task of translating a sentence in a source language to a different target language.

Approaches for machine translation can range from rule-based to statistical to neural-based. More recently, encoder-decoder attention-based architectures like BERT have attained major improvements in machine translation.

One of the most popular datasets used to benchmark machine translation systems is the WMT family of datasets. Some of the most commonly used evaluation metrics for machine translation systems include BLEU, METEOR, NIST, and others.

( Image credit: Google seq2seq )

Libraries

Use these libraries to find Machine Translation models and implementations
24 papers
1,206
15 papers
29,292
14 papers
125,385
See all 14 libraries.

Latest papers with no code

Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation

no code yet • 23 Apr 2024

To substantiate our hypothesis, we systematically analyze the performance of distillation methods by varying the model size of student models, the complexity of text, and the difficulty of decoding procedure.

From LLM to NMT: Advancing Low-Resource Machine Translation with Claude

no code yet • 22 Apr 2024

We show that Claude 3 Opus, a large language model (LLM) released by Anthropic in March 2024, exhibits stronger machine translation competence than other LLMs.

Fine-Tuning Large Language Models to Translate: Will a Touch of Noisy Data in Misaligned Languages Suffice?

no code yet • 22 Apr 2024

Traditionally, success in multilingual machine translation can be attributed to three key factors in training data: large volume, diverse translation directions, and high quality.

Evaluation of Machine Translation Based on Semantic Dependencies and Keywords

no code yet • 20 Apr 2024

To achieve a comprehensive and in-depth evaluation of the semantic correctness of sentences, the experimental results show that the accuracy of the evaluation algorithm has been improved compared with similar methods, and it can more accurately measure the semantic correctness of machine translation.

Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair

no code yet • 18 Apr 2024

In Simultaneous Machine Translation (SiMT) systems, training with a simultaneous interpretation (SI) corpus is an effective method for achieving high-quality yet low-latency systems.

Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory

no code yet • 18 Apr 2024

We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data.

Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation

no code yet • 17 Apr 2024

Training a unified multilingual model promotes knowledge transfer but inevitably introduces negative interference.

GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning

no code yet • 14 Apr 2024

The emergence of Large Language Models (LLMs) with capabilities like In-Context Learning (ICL) has ushered in new possibilities for data generation across various domains while minimizing the need for extensive data collection and modeling techniques.

Multilingual Evaluation of Semantic Textual Relatedness

no code yet • 13 Apr 2024

The explosive growth of online content demands robust Natural Language Processing (NLP) techniques that can capture nuanced meanings and cultural context across diverse languages.

Extending Translate-Train for ColBERT-X to African Language CLIR

no code yet • 11 Apr 2024

This paper describes the submission runs from the HLTCOE team at the CIRAL CLIR tasks for African languages at FIRE 2023.