Evaluation of Representation Models for Text Classification with AutoML Tools

24 Jun 2021 · Sebastian Brändle, Marc Hanussek, Matthias Blohm, Maximilien Kintz ·

Automated Machine Learning (AutoML) has gained increasing success on tabular data in recent years. However, processing unstructured data like text is a challenge and not widely supported by open-source AutoML tools. This work compares three manually created text representations and text embeddings automatically created by AutoML tools. Our benchmark includes four popular open-source AutoML tools and eight datasets for text classification purposes. The results show that straightforward text representations perform better than AutoML tools with automatically created text embeddings.

PDF Abstract