no code implementations • 15 May 2019 • Jiacheng Wu, Yong Zhang, Jin Wang, Chunbin Lin, Yingjia Fu, Chunxiao Xing
To address the limitation, we propose SP-Join, an end-to-end framework to support distributed similarity join in metric space based on the MapReduce paradigm, which (i) employs an estimation-based stratified sampling method to produce pivots with quality guarantees for any sample size, and (ii) devises an effective cost model as the guideline to split the whole datasets into partition in map and reduce phases according to the sampled pivots.
Databases