POI-Transformers: POI Entity Matching through POI Embeddings by Incorporating Semantic and Geographic Information

29 Sep 2021  ·  Jinbao Zhang, Changwang Zhang, Xiaojuan Liu, Xia Li, Weilin Liao, Penghua Liu, Yao Yao, Jihong Zhang ·

Point of Interest (POI) data is crucial to location-based applications and various user-oriented services. However, three problems are existing in POI entity matching. First, traditional approaches to general entity matching are designed without geographic location information, which ignores the geographic features when performing POI entity matching. Second, existing POI matching methods for feature design are heavily dependent on the experts’ knowledge. Third, current deep learning-based entity matching approaches require a high computational complexity since all the potential POI entity pairs need input to the network. A general and robust POI embedding framework, the POI-Transformers, is initially proposed in this study to address these problems of POI entity matching. The POI-Transformers can generate semantically meaningful POI embeddings through aggregating the text attributes and geographic location, and minimize the inconsistency of a POI entity by measuring the distance between the newly generated POI embeddings. Moreover, the POI entities are matched by the similarity of POI embeddings instead of directly comparing the POI entities, which can greatly reduce the complexity of computation. The implementation of the POI-Transformers achieves a high F1 score of 95.8% on natural scenes data sets (from the Gaode Map and the Tencent Map) in POI entity matching and is comparable to the state-of-the-art (SOTA) entity matching methods of DeepER, DeepMatcher, and Ditto (in entity matching benchmark data set). Compared with the existing deep learning methods, our method reduces the effort for identifying one million pairs from about 20 hours to 228 seconds. These demonstrate that the proposed POI-Transformers framework significantly outstrips traditional methods both in accuracy and efficiency.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here