Semantic Textual Similarity

560 papers with code • 13 benchmarks • 17 datasets

Semantic textual similarity deals with determining how similar two pieces of texts are. This can take the form of assigning a score from 1 to 5. Related tasks are paraphrase or duplicate identification.

Image source: Learning Semantic Textual Similarity from Conversations

Libraries

Use these libraries to find Semantic Textual Similarity models and implementations

Most implemented papers

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention

mlpen/Nystromformer 7 Feb 2021

The scalability of Nystr\"{o}mformer enables application to longer sequences with thousands of tokens.

MedSTS: A Resource for Clinical Semantic Textual Similarity

ncbi-nlp/BioSentVec 28 Aug 2018

A subset of MedSTS (MedSTS_ann) containing 1, 068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0-5 (low to high similarity).

Q8BERT: Quantized 8Bit BERT

NervanaSystems/nlp-architect 14 Oct 2019

Recently, pre-trained Transformer based language models such as BERT and GPT, have shown great improvement in many Natural Language Processing (NLP) tasks.

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

tensorflow/models ACL 2020

Then, we conduct knowledge transfer from this teacher to MobileBERT.

Calculating the similarity between words and sentences using a lexical database and corpus statistics

nihitsaxena95/sentence-similarity-wordnet-sementic 15 Feb 2018

To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database.

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

facebookresearch/SentEval ICLR 2018

In this work, we present a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model.

How to Train BERT with an Academic Budget

peteriz/academic-budget-bert EMNLP 2021

While large language models a la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford.

Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding

shanzhenren/PLE 17 Feb 2016

Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions.