Semantic Similarity

417 papers with code • 8 benchmarks • 12 datasets

The main objective Semantic Similarity is to measure the distance between the semantic meanings of a pair of words, phrases, sentences, or documents. For example, the word “car” is more similar to “bus” than it is to “cat”. The two main approaches to measuring Semantic Similarity are knowledge-based approaches and corpus-based, distributional methods.

Source: Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection

Benchmarks

Add a Result

These leaderboards are used to track progress in Semantic Similarity

Dataset	Best Model	Compare
SICK	Dependency Tree-LSTM (Tai et al., 2015)	See all
Annotated corpus for semantic similarity of clinical trial outcomes (original corpus)	BioBERT (pre-trained on PubMed abstracts + PMC, fine-tuned on "Annotated corpus for semantic similarity of clinical trial outcomes, original corpus")	See all
Annotated corpus for semantic similarity of clinical trial outcomes (expanded corpus)	BioBERT (pre-trained on PubMed abstracts + PMC, fine-tuned on "Annotated corpus for semantic similarity of clinical trial outcomes, expanded corpus")	See all
BIOSSES	NCBI_BERT(base) (P+M)	See all
MedSTS	NCBI_BERT(base) (P+M)	See all
ClinicalSTS	CharacterBERT (base, medical, ensemble)	See all
CHIP-STS	MacBERT-large	See all
STS Benchmark	Def2Vec	See all

Libraries

Use these libraries to find Semantic Similarity models and implementations

faceonlive/ai-research

2 papers

156

juliendenize/eztorch

2 papers

Datasets

Subtasks

Similarity Explanation

Most implemented papers

Most implemented Social Latest No code

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

UKPLab/sentence-transformers • • IJCNLP 2019

However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.

Paper
Code

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

stanfordnlp/treelstm • • IJCNLP 2015

Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks.

Paper
Code

ERNIE: Enhanced Representation through Knowledge Integration

PaddlePaddle/PaddleNLP • • 19 Apr 2019

We present a novel language representation model enhanced by knowledge called ERNIE (Enhanced Representation through kNowledge IntEgration).

Paper
Code

Improving Language Understanding by Generative Pre-Training

huggingface/transformers • • Preprint 2018

We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task.

Paper
Code

Language-agnostic BERT Sentence Embedding

FreddeFrallan/Multilingual-CLIP • • ACL 2022

While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.

Paper
Code

Calculating the similarity between words and sentences using a lexical database and corpus statistics

nihitsaxena95/sentence-similarity-wordnet-sementic • 15 Feb 2018

To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database.

Paper
Code

MedSTS: A Resource for Clinical Semantic Textual Similarity

ncbi-nlp/BioSentVec • 28 Aug 2018

A subset of MedSTS (MedSTS_ann) containing 1, 068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0-5 (low to high similarity).

Paper
Code

Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets

ncbi-nlp/NCBI_BERT • • WS 2019

Paper
Code

Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding

shanzhenren/PLE • 17 Feb 2016

Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions.

Paper
Code

Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks

nathanshartmann/portuguese_word_embeddings • WS 2017

Word embeddings have been found to provide meaningful representations for words in an efficient way; therefore, they have become common in Natural Language Processing sys- tems.

Paper
Code

Semantic Similarity

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result