Tabular Learning: Encoding for Entity and Context Embeddings

28 Mar 2024  ·  Fredy Reusser ·

Examining the effect of different encoding techniques on entity and context embeddings, the goal of this work is to challenge commonly used Ordinal encoding for tabular learning. Applying different preprocessing methods and network architectures over several datasets resulted in a benchmark on how the encoders influence the learning outcome of the networks. By keeping the test, validation and training data consistent, results have shown that ordinal encoding is not the most suited encoder for categorical data in terms of preprocessing the data and thereafter, classifying the target variable correctly. A better outcome was achieved, encoding the features based on string similarities by computing a similarity matrix as input for the network. This is the case for both, entity and context embeddings, where the transformer architecture showed improved performance for Ordinal and Similarity encoding with regard to multi-label classification tasks.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here