The Amazon-Google dataset for entity resolution derives from the online retailers Amazon.com and the product search service of Google accessible through the Google Base Data API. The dataset contains 1363 entities from amazon.com and 3226 google products as well as a gold standard (perfect mapping) with 1300 matching record pairs between the two data sources. The common attributes between the two data sources are: product name, product description, manufacturer and price.
19 PAPERS • 2 BENCHMARKS
The Abt-Buy dataset for entity resolution derives from the online retailers Abt.com and Buy.com. The dataset contains 1081 entities from abt.com and 1092 entities from buy.com as well as a gold standard (perfect mapping) with 1097 matching record pairs between the two data sources. The common attributes between the two data sources are: product name, product description and product price.
18 PAPERS • 2 BENCHMARKS
WDC Products is an entity matching benchmark which provides for the systematic evaluation of matching systems along combinations of three dimensions while relying on real-word data. The three dimensions are
3 PAPERS • 3 BENCHMARKS
WDC Block is a benchmark for comparing the performance of blocking methods that are used as part of entity resolution pipelines.
1 PAPER • 3 BENCHMARKS