Multi-Relational Embedding for Knowledge Graph Representation and Analysis

Multi-relational data, such as knowledge graphs, bibliographic data, and information networks are prevalent in real-world datasets. Managing, exploring, and utilizing these large and complex datasets effectively are challenging. In recent years, multi-relational embedding methods have emerged as a new effective approach to model multi-relational data by representing both the entities and the relations as embedding vectors in semantic space. On knowledge graphs, multi-relational embedding methods aim to model the interactions between these embedding vectors to predict the relational link between entities. These knowledge graph embedding methods solve the important inherent task of link prediction for knowledge graph completion, but also provide the embedding representations that have various potential applications. The goal of this thesis is first to study multi-relational embedding on knowledge graphs to propose a new embedding model that explains and improves previous methods, then to study the applications of multi-relational embedding in representation and analysis of knowledge graphs. For the first part of the thesis, we study the theoretical framework of knowledge graph embedding methods to explain and improve them. We review and analyze the popular class of semantic matching knowledge graph embedding methods, with a focus on the state-of-the-art trilinear-product-based models such as ComplEx. Based on our analysis, we identify two fundamental complementary aspects that a knowledge graph embedding model needs to address, that is, computational efficiency and model expressiveness. Previous trilinear-product-based models use specially designed interaction mechanisms to manually provide a trade-off between the two aspects. However, their interaction mechanisms are specially designed and fixed, potentially causing them to be suboptimal or difficult to extend. In this thesis, we propose the multi-partition embedding interaction (MEI) model with block term format to systematically address this problem. MEI divides each embedding into a multi-partition vector to efficiently restrict the interactions. Each local interaction is modeled with the Tucker tensor format and the full interaction is modeled with the block term tensor format, enabling MEI to control the trade-off between expressiveness and computational cost, learn the interaction mechanisms from data automatically. The model combines advanced tensor representation formats and modern deep learning techniques to achieve state-of-the-art performance on the link prediction task. The theoretical framework of the MEI model is then used as a general mechanism of knowledge graph embedding to analyze, explain, and generalize previous models. We also draw the connections to word embeddings and language modeling to provide some new insights and generalizations. For the second part of the thesis, we study how to apply multi-relational embedding in representation and analysis of knowledge graphs. Unlike word embedding, the semantic structures such as similarity and analogy structures in knowledge graph embedding space are not well-studied, and thus not usually utilized for data representation and analysis. To demonstrate the application of multi-relational embedding, we formalize a framework for data representation and analysis by semantic queries on the multi-relational embedding space. We build a knowledge graph from scholarly data and show how various tasks on the original datasets can be approximated by appropriate semantic queries, which are multi-linear algebraic operations on the multi-relational embedding spaces. We also theoretically study the entity analogy reasoning task in multi-relational embedding space, which can be formulated as an open-relational query by examples task, doing relational query on unseen relations. Using the above mathematical connections between knowledge graph embeddings and word embeddings, we analyze the semantic structures in the knowledge graph embedding space and propose potential solution to the above entity analogy reasoning task. The goal of this endeavor is to explore potential applications of recent advancements in multi-relational embedding to data representation and analysis, especially to improve its effectiveness on scholarly data.

PDF

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Link Prediction KG20C CP-N3 (small) MRR 0.215 # 2
Hits@1 0.148 # 2
Hits@3 0.234 # 2
Hits@10 0.348 # 2
Link Prediction KG20C Word2vec-N3 (small) MRR 0.068 # 3
Hits@1 0.011 # 3
Hits@3 0.070 # 3
Hits@10 0.177 # 3

Methods