Encoding Category Trees Into Word-Embeddings Using Geometric Approach

We present a novel method to implicitly encode a tree-structured category information into word-embeddings, resulting in super-dimensional ball representations ($n$-ball embedding for short). Inclusion relations among $n$-balls precisely encode subordinate relations among categories. The cosine similarity function is enriched by category information. A large $n$-ball dataset is constructed using geometrical method, which achieves zero energy cost in embedding tree structures into word embedding. A new benchmark dataset is created for predicting the category of unknown words. Experiments show that $n$-ball embeddings, carried with category information, significantly out-perform word-embeddings in the neighbourhood test, while only slightly change the original word-embeddings. Experiment results also show that $n$-ball embeddings demonstrate surprisingly good performance in validating the category of unknown word. Source codes and data-sets are free for public access \url{https://github.com/gnodisnait/nball4tree.git} and \url{https://github.com/gnodisnait/bp94nball.git}.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here