TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Representation Learning	SciDocs	CiteBERT	Avg.	58.8	# 6
Representation Learning	SciDocs	Sci-DeCLUTR	Avg.	66.6	# 4
Representation Learning	SciDocs	SciNCL	Avg.	81.8	# 1
Citation Prediction	SciDocs (Citation Prediction)	SciNCL	MAP	93.6	# 1
Document Classification	SciDocs (MAG)	SciNCL	F1 (micro)	81.4	# 2
Document Classification	SciDocs (MeSH)	SciNCL	F1 (micro)	88.7	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/neighborhood-contrastive-learning-for-1/representation-learning-on-scidocs)](https://paperswithcode.com/sota/representation-learning-on-scidocs?p=neighborhood-contrastive-learning-for-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/neighborhood-contrastive-learning-for-1/citation-prediction-on-scidocs-citation)](https://paperswithcode.com/sota/citation-prediction-on-scidocs-citation?p=neighborhood-contrastive-learning-for-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/neighborhood-contrastive-learning-for-1/document-classification-on-scidocs-mesh)](https://paperswithcode.com/sota/document-classification-on-scidocs-mesh?p=neighborhood-contrastive-learning-for-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/neighborhood-contrastive-learning-for-1/document-classification-on-scidocs-mag)](https://paperswithcode.com/sota/document-classification-on-scidocs-mag?p=neighborhood-contrastive-learning-for-1)`

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings

14 Feb 2022 · Malte Ostendorff, Nils Rethmeier, Isabelle Augenstein, Bela Gipp, Georg Rehm ·

Learning scientific document representations can be substantially improved through contrastive learning objectives, where the challenge lies in creating positive and negative training samples that encode the desired similarity semantics. Prior work relies on discrete citation relations to generate contrast samples. However, discrete citations enforce a hard cut-off to similarity. This is counter-intuitive to similarity-based learning, and ignores that scientific papers can be very similar despite lacking a direct citation - a core problem of finding related research. Instead, we use controlled nearest neighbor sampling over citation graph embeddings for contrastive learning. This control allows us to learn continuous similarity, to sample hard-to-learn negatives and positives, and also to avoid collisions between negative and positive samples by controlling the sampling margin between them. The resulting method SciNCL outperforms the state-of-the-art on the SciDocs benchmark. Furthermore, we demonstrate that it can train (or tune) models sample-efficiently, and that it can be combined with recent training-efficient methods. Perhaps surprisingly, even training a general-domain language model this way outperforms baselines pretrained in-domain.

PDF Abstract

Code

Add Remove Mark official

malteos/scincl official

↳ Quickstart in

Spaces

Tasks

Add Remove

Citation Prediction

Contrastive Learning

Document Classification

Document Embedding

Language Modelling

Representation Learning

Datasets

S2ORC

SciDocs

Results from the Paper

Add Remove

Ranked #1 on Document Classification on SciDocs (MeSH)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Representation Learning	SciDocs	CiteBERT	Avg.	58.8	# 6	Compare
Representation Learning	SciDocs	Sci-DeCLUTR	Avg.	66.6	# 4	Compare
Representation Learning	SciDocs	SciNCL	Avg.	81.8	# 1	Compare
Citation Prediction	SciDocs (Citation Prediction)	SciNCL	MAP	93.6	# 1	Compare
Document Classification	SciDocs (MAG)	SciNCL	F1 (micro)	81.4	# 2	Compare
Document Classification	SciDocs (MeSH)	SciNCL	F1 (micro)	88.7	# 1	Compare

Methods

Add Remove

Contrastive Learning

Edit Social Preview

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove