1 code implementation • Proceedings of the VLDB Endowment 2023 • Derek Paulsen, Yash Govind, AnHai Doan
We develop Sparkly, which uses Lucene to perform top-k tf/idf blocking in a distributed share-nothing fashion on a Spark cluster.
Ranked #2 on Blocking on Amazon-Google
1 code implementation • Proceedings of the VLDB Endowment 2021 • Saravanan Thirumuruganathan, Han Li, Nan Tang, Mourad Ouzzani, Yash Govind, Derek Paulsen, Glenn Fung, AnHai Doan
In this paper, we develop the DeepBlocker framework that significantly advances the state of the art in applying DL to blocking for EM.
Ranked #5 on Blocking on Abt-Buy
no code implementations • 29 Sep 2017 • AnHai Doan, Adel Ardalan, Jeffrey R. Ballard, Sanjib Das, Yash Govind, Pradap Konda, Han Li, Erik Paulson, Paul Suganthan G. C., Haojun Zhang
They provide tools to address the "pain points" of the steps, and tools are built on top of the Python data science and Big Data ecosystem (PyData).
Databases