Word Alignment
84 papers with code • 7 benchmarks • 4 datasets
Word Alignment is the task of finding the correspondence between source and target words in a pair of sentences that are translations of each other.
Source: Neural Network-based Word Alignment through Score Aggregation
Latest papers
Cross-lingual Contextualized Phrase Retrieval
In our work, we propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval, which aims to augment cross-lingual applications by addressing polysemy using context information.
Multilingual Coreference Resolution in Low-resource South Asian Languages
We introduce a Translated dataset for Multilingual Coreference Resolution (TransMuCoRes) in 31 South Asian languages using off-the-shelf tools for translation and word-alignment.
Constrained Decoding for Cross-lingual Label Projection
Therefore, it is common to exploit translation and label projection to further improve the performance by (1) translating training data that is available in a high-resource language (e. g., English) together with the gold labels into low-resource languages, and/or (2) translating test data in low-resource languages to a high-source language to run inference on, then projecting the predicted span-level labels back onto the original test data.
Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models
This study revisits these challenges, offering insights into their ongoing relevance in the context of advanced Large Language Models (LLMs): domain mismatch, amount of parallel data, rare word prediction, translation of long sentences, attention model as word alignment, and sub-optimal beam search.
Aligning and Prompting Everything All at Once for Universal Visual Perception
However, predominant paradigms, driven by casting instance-level tasks as an object-word alignment, bring heavy cross-modality interaction, which is not effective in prompting object detection and visual grounding.
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
CoDet then leverages visual similarities to discover the co-occurring objects and align them with the shared concept.
On Bilingual Lexicon Induction with Large Language Models
Bilingual Lexicon Induction (BLI) is a core task in multilingual NLP that still, to a large extent, relies on calculating cross-lingual word representations.
Improving Translation Faithfulness of Large Language Models via Augmenting Instructions
The experimental results demonstrate significant improvements in translation performance with SWIE based on BLOOMZ-3b, particularly in zero-shot and long text translations due to reduced instruction forgetting risk.
Towards Arabic Multimodal Dataset for Sentiment Analysis
In contrast, Arabic DL-based multimodal sentiment analysis (MSA) is still in its infantile stage due, mainly, to the lack of standard datasets.
WSPAlign: Word Alignment Pre-training via Large-Scale Weakly Supervised Span Prediction
Most existing word alignment methods rely on manual alignment datasets or parallel corpora, which limits their usefulness.