TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Linear-Probe Classification	SentEval	Sentence-BERT:	Accuracy	87.7	# 5
Semantic Textual Similarity	SICK	SRoBERTa-NLI-base	Spearman Correlation	0.7446	# 6
Semantic Textual Similarity	SICK	SBERT-NLI-base	Spearman Correlation	0.7291	# 10
Semantic Textual Similarity	SICK	SBERT-NLI-large	Spearman Correlation	0.7375	# 9
Semantic Textual Similarity	SICK	SRoBERTa-NLI-large	Spearman Correlation	0.7429	# 7
Semantic Textual Similarity	STS12	SRoBERTa-NLI-large	Spearman Correlation	0.7453	# 13
Semantic Textual Similarity	STS13	SBERT-NLI-large	Spearman Correlation	0.7846	# 21
Semantic Textual Similarity	STS14	SBERT-NLI-large	Spearman Correlation	0.7490000000000001	# 16
Semantic Textual Similarity	STS15	SRoBERTa-NLI-large	Spearman Correlation	0.8185	# 16
Semantic Textual Similarity	STS16	SRoBERTa-NLI-large	Spearman Correlation	0.7682	# 18
Semantic Textual Similarity	STS Benchmark	SBERT-NLI-large	Spearman Correlation	0.79	# 31
Semantic Textual Similarity	STS Benchmark	SBERT-STSb-base	Spearman Correlation	0.8479	# 25
Semantic Textual Similarity	STS Benchmark	SRoBERTa-NLI-STSb-large	Spearman Correlation	0.8615	# 23
Semantic Textual Similarity	STS Benchmark	SRoBERTa-NLI-base	Spearman Correlation	0.7777	# 34
Semantic Textual Similarity	STS Benchmark	SBERT-NLI-base	Spearman Correlation	0.7703	# 35
Semantic Textual Similarity	STS Benchmark	SBERT-STSb-large	Spearman Correlation	0.8445	# 27

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sentence-bert-sentence-embeddings-using/linear-probe-classification-on-senteval)](https://paperswithcode.com/sota/linear-probe-classification-on-senteval?p=sentence-bert-sentence-embeddings-using)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sentence-bert-sentence-embeddings-using/semantic-textual-similarity-on-sick)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sick?p=sentence-bert-sentence-embeddings-using)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sentence-bert-sentence-embeddings-using/semantic-textual-similarity-on-sts12)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts12?p=sentence-bert-sentence-embeddings-using)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sentence-bert-sentence-embeddings-using/semantic-textual-similarity-on-sts14)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts14?p=sentence-bert-sentence-embeddings-using)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sentence-bert-sentence-embeddings-using/semantic-textual-similarity-on-sts15)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts15?p=sentence-bert-sentence-embeddings-using)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sentence-bert-sentence-embeddings-using/semantic-textual-similarity-on-sts16)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts16?p=sentence-bert-sentence-embeddings-using)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sentence-bert-sentence-embeddings-using/semantic-textual-similarity-on-sts13)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts13?p=sentence-bert-sentence-embeddings-using)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sentence-bert-sentence-embeddings-using/semantic-textual-similarity-on-sts-benchmark)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark?p=sentence-bert-sentence-embeddings-using)`

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

IJCNLP 2019 · Nils Reimers, Iryna Gurevych ·

BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.

PDF Abstract IJCNLP 2019 PDF IJCNLP 2019 Abstract

Code

Add Remove Mark official

UKPLab/sentence-transformers official

13,835

PaddlePaddle/PaddleNLP

11,448

princeton-nlp/SimCSE

↳ Quickstart in

Colab

Spaces

3,249

dmmiller612/bert-extractive-summari…

1,348

InsaneLife/dssm

652

See all 60 implementations

Tasks

Add Remove

Clustering

Linear-Probe Classification

Semantic Similarity

Semantic Textual Similarity

Sentence

Sentence Embeddings

STS

Transfer Learning

Datasets

SST

MultiNLI

SNLI

SICK

MPQA Opinion Corpus

SentEval STS Benchmark

Results from the Paper

Edit

Ranked #5 on Linear-Probe Classification on SentEval

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Linear-Probe Classification	SentEval	Sentence-BERT:	Accuracy	87.7	# 5	Compare
Semantic Textual Similarity	SICK	SRoBERTa-NLI-base	Spearman Correlation	0.7446	# 6	Compare
Semantic Textual Similarity	SICK	SBERT-NLI-base	Spearman Correlation	0.7291	# 10	Compare
Semantic Textual Similarity	SICK	SBERT-NLI-large	Spearman Correlation	0.7375	# 9	Compare
Semantic Textual Similarity	SICK	SRoBERTa-NLI-large	Spearman Correlation	0.7429	# 7	Compare
Semantic Textual Similarity	STS12	SRoBERTa-NLI-large	Spearman Correlation	0.7453	# 13	Compare
Semantic Textual Similarity	STS13	SBERT-NLI-large	Spearman Correlation	0.7846	# 21	Compare
Semantic Textual Similarity	STS14	SBERT-NLI-large	Spearman Correlation	0.7490000000000001	# 16	Compare
Semantic Textual Similarity	STS15	SRoBERTa-NLI-large	Spearman Correlation	0.8185	# 16	Compare
Semantic Textual Similarity	STS16	SRoBERTa-NLI-large	Spearman Correlation	0.7682	# 18	Compare
Semantic Textual Similarity	STS Benchmark	SBERT-NLI-large	Spearman Correlation	0.79	# 31	Compare
Semantic Textual Similarity	STS Benchmark	SBERT-STSb-base	Spearman Correlation	0.8479	# 25	Compare
Semantic Textual Similarity	STS Benchmark	SRoBERTa-NLI-STSb-large	Spearman Correlation	0.8615	# 23	Compare
Semantic Textual Similarity	STS Benchmark	SRoBERTa-NLI-base	Spearman Correlation	0.7777	# 34	Compare
Semantic Textual Similarity	STS Benchmark	SBERT-NLI-base	Spearman Correlation	0.7703	# 35	Compare
Semantic Textual Similarity	STS Benchmark	SBERT-STSb-large	Spearman Correlation	0.8445	# 27	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • RoBERTa • SBERT • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove