TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Blocking	Abt-Buy	Auto	Recall	87.2	# 6
Blocking	Abt-Buy	Auto	Candidate Set Size	21600	# 5
Blocking	Amazon-Google	Auto	Recall	97.1	# 5
Blocking	Amazon-Google	Auto	Candidate Set Size	68200	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-learning-for-blocking-in-entity-matching/blocking-on-abt-buy)](https://paperswithcode.com/sota/blocking-on-abt-buy?p=deep-learning-for-blocking-in-entity-matching)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-learning-for-blocking-in-entity-matching/blocking-on-amazon-google)](https://paperswithcode.com/sota/blocking-on-amazon-google?p=deep-learning-for-blocking-in-entity-matching)`

Deep learning for blocking in entity matching: a design space exploration

Proceedings of the VLDB Endowment 2021 · Saravanan Thirumuruganathan, Han Li, Nan Tang, Mourad Ouzzani, Yash Govind, Derek Paulsen, Glenn Fung, AnHai Doan ·

Entity matching (EM) finds data instances that refer to the same real-world entity. Most EM solutions perform blocking then matching. Many works have applied deep learning (DL) to matching, but far fewer works have applied DL to blocking. These blocking works are also limited in that they consider only a simple form of DL and some of them require labeled training data. In this paper, we develop the DeepBlocker framework that significantly advances the state of the art in applying DL to blocking for EM. We first define a large space of DL solutions for blocking, which contains solutions of varying complexity and subsumes most previous works. Next, we develop eight representative solutions in this space. These solutions do not require labeled training data and exploit recent advances in DL (e.g., sequence modeling, transformer, self supervision). We empirically determine which solutions perform best on what kind of datasets (structured, textual, or dirty). We show that the best solutions (among the above eight) outperform the best existing DL solution and the best existing non-DL solutions (including a state-of-the-art industrial non-DL solution), on dirty and textual data, and are comparable on structured data. Finally, we show that the combination of the best DL and non-DL solutions can perform even better, suggesting a new venue for research.

PDF Abstract

Code

Add Remove Mark official

qcri/deepblocker

Tasks

Add Remove

Blocking

Datasets

Amazon-Google Abt-Buy

Results from the Paper

Add Remove

Ranked #5 on Blocking on Abt-Buy

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Blocking	Abt-Buy	Auto	Recall	87.2	# 6	Compare
Blocking	Abt-Buy	Auto	Candidate Set Size	21600	# 5	Compare
Blocking	Amazon-Google	Auto	Recall	97.1	# 5	Compare
Blocking	Amazon-Google	Auto	Candidate Set Size	68200	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Deep learning for blocking in entity matching: a design space exploration

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove