Paraphrase Identification

72 papers with code • 10 benchmarks • 17 datasets

The goal of Paraphrase Identification is to determine whether a pair of sentences have the same meaning.

Source: Adversarial Examples with Difficult Common Words for Paraphrase Identification

Image source: On Paraphrase Identification Corpora

Libraries

Use these libraries to find Paraphrase Identification models and implementations

Most implemented papers

Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

deepmipt/bert 17 May 2019

This work shows that transfer learning from a multilingual model to monolingual model results in significant growth of performance on such tasks as reading comprehension, paraphrase detection, and sentiment analysis.

ERNIE: Enhanced Language Representation with Informative Entities

thunlp/ERNIE ACL 2019

Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks.

Dice Loss for Data-imbalanced NLP Tasks

ShannonAI/dice_loss_for_NLP ACL 2020

Many NLP tasks such as tagging and machine reading comprehension are faced with the severe data imbalance issue: negative examples significantly outnumber positive examples, and the huge number of background examples (or easy-negative examples) overwhelms the training.

Pay Attention when Required

NVIDIA/DeepLearningExamples 9 Sep 2020

Transformer-based models consist of interleaved feed-forward blocks - that capture content meaning, and relatively more expensive self-attention blocks - that capture context meaning.

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

rabeehk/compacter ACL 2021

Although pretrained language models can be fine-tuned to produce state-of-the-art results for a very wide range of language understanding tasks, the dynamics of this process are not well understood, especially in the low data regime.

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

google-research/google-research ICLR 2022

In this paper, we propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model.

Sentence Similarity Learning by Lexical Decomposition and Composition

Leputa/CIKM-AnalytiCup-2018 COLING 2016

Most conventional sentence similarity methods only focus on similar parts of two input sentences, and simply ignore the dissimilar parts, which usually give us some clues and semantic meanings about the sentences.

A Study of MatchPyramid Models on Ad-hoc Retrieval

albpurpura/PE4IR 15 Jun 2016

Although ad-hoc retrieval can also be formalized as a text matching task, few deep models have been tested on it.