1 code implementation • Proceedings of the VLDB Endowment 2023 • Derek Paulsen, Yash Govind, AnHai Doan
We develop Sparkly, which uses Lucene to perform top-k tf/idf blocking in a distributed share-nothing fashion on a Spark cluster.
Ranked #2 on Blocking on Amazon-Google
1 code implementation • Proceedings of the VLDB Endowment 2021 • Saravanan Thirumuruganathan, Han Li, Nan Tang, Mourad Ouzzani, Yash Govind, Derek Paulsen, Glenn Fung, AnHai Doan
In this paper, we develop the DeepBlocker framework that significantly advances the state of the art in applying DL to blocking for EM.
Ranked #5 on Blocking on Abt-Buy