Semantic Textual Similarity
560 papers with code • 13 benchmarks • 17 datasets
Semantic textual similarity deals with determining how similar two pieces of texts are. This can take the form of assigning a score from 1 to 5. Related tasks are paraphrase or duplicate identification.
Image source: Learning Semantic Textual Similarity from Conversations
Libraries
Use these libraries to find Semantic Textual Similarity models and implementationsMost implemented papers
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
The scalability of Nystr\"{o}mformer enables application to longer sequences with thousands of tokens.
MedSTS: A Resource for Clinical Semantic Textual Similarity
A subset of MedSTS (MedSTS_ann) containing 1, 068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0-5 (low to high similarity).
Q8BERT: Quantized 8Bit BERT
Recently, pre-trained Transformer based language models such as BERT and GPT, have shown great improvement in many Natural Language Processing (NLP) tasks.
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Then, we conduct knowledge transfer from this teacher to MobileBERT.
RealFormer: Transformer Likes Residual Attention
Transformer is the backbone of modern NLP models.
Calculating the similarity between words and sentences using a lexical database and corpus statistics
To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database.
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
In this work, we present a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model.
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
Humans read and write hundreds of billions of messages every day.
How to Train BERT with an Academic Budget
While large language models a la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford.
Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding
Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions.