TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Re-Ranking	AskUbuntu	TSDAE	MAP	59.4	# 1
Information Retrieval	CQADupStack	TSDAE	mAP@100	0.145	# 2
Paraphrase Identification	PIT	TSDAE	AP	69.2	# 1
Re-Ranking	SciDocs	TSDAE	Cite	71.4	# 1
Re-Ranking	SciDocs	TSDAE	CC	73.9	# 1
Re-Ranking	SciDocs	TSDAE	CR	75.0	# 1
Re-Ranking	SciDocs	TSDAE	CV	75.6	# 1
Paraphrase Identification	TURL	TSDAE	AP	76.8	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tsdae-using-transformer-based-sequential/re-ranking-on-askubuntu)](https://paperswithcode.com/sota/re-ranking-on-askubuntu?p=tsdae-using-transformer-based-sequential)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tsdae-using-transformer-based-sequential/paraphrase-identification-on-pit)](https://paperswithcode.com/sota/paraphrase-identification-on-pit?p=tsdae-using-transformer-based-sequential)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tsdae-using-transformer-based-sequential/re-ranking-on-scidocs)](https://paperswithcode.com/sota/re-ranking-on-scidocs?p=tsdae-using-transformer-based-sequential)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tsdae-using-transformer-based-sequential/paraphrase-identification-on-turl)](https://paperswithcode.com/sota/paraphrase-identification-on-turl?p=tsdae-using-transformer-based-sequential)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tsdae-using-transformer-based-sequential/information-retrieval-on-cqadupstack)](https://paperswithcode.com/sota/information-retrieval-on-cqadupstack?p=tsdae-using-transformer-based-sequential)`

TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

14 Apr 2021 · Kexin Wang, Nils Reimers, Iryna Gurevych ·

Learning sentence embeddings often requires a large amount of labeled data. However, for most tasks and domains, labeled data is seldom available and creating it is expensive. In this work, we present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE) which outperforms previous approaches by up to 6.4 points. It can achieve up to 93.1% of the performance of in-domain supervised approaches. Further, we show that TSDAE is a strong domain adaptation and pre-training method for sentence embeddings, significantly outperforming other approaches like Masked Language Model. A crucial shortcoming of previous studies is the narrow evaluation: Most work mainly evaluates on the single task of Semantic Textual Similarity (STS), which does not require any domain knowledge. It is unclear if these proposed methods generalize to other domains and tasks. We fill this gap and evaluate TSDAE and other recent approaches on four different datasets from heterogeneous domains.

PDF Abstract

Code

Add Remove Mark official

UKPLab/sentence-transformers official

13,762

kwang2049/pytorch-bertflow official

ukplab/pytorch-bertflow official

kwang2049/useb official

UKPLab/useb

See all 7 implementations

Tasks

Add Remove

Denoising

Domain Adaptation

Information Retrieval

Language Modelling

Paraphrase Identification

Re-Ranking

Semantic Textual Similarity

Sentence

Sentence Embedding

Sentence-Embedding

Sentence Embeddings

STS

Datasets

SciDocs PIT

TURL CQADupStack AskUbuntu

Results from the Paper

Edit

Ranked #1 on Re-Ranking on AskUbuntu

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Re-Ranking	AskUbuntu	TSDAE	MAP	59.4	# 1	Compare
Information Retrieval	CQADupStack	TSDAE	mAP@100	0.145	# 2	Compare
Paraphrase Identification	PIT	TSDAE	AP	69.2	# 1	Compare
Re-Ranking	SciDocs	TSDAE	Cite	71.4	# 1	Compare
			CC	73.9	# 1	Compare
			CR	75.0	# 1	Compare
			CV	75.6	# 1	Compare
Paraphrase Identification	TURL	TSDAE	AP	76.8	# 1	Compare

Methods

Add Remove

Scaled Dot-Product Attention • Softmax • TSDAE

Edit Social Preview

TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove