Text Categorization
41 papers with code • 0 benchmarks • 6 datasets
Text Categorization is the task of automatically assigning pre-defined categories to documents written in natural languages. Several types of Text Categorization have been studied, each of which deals with different types of documents and categories, such as topic categorization to detect discussed topics (e.g., sports, politics), spam detection, and sentiment classification to determine the sentiment typically in product or movie reviews.
Source: Effective Use of Word Order for Text Categorization with Convolutional Neural Networks
Benchmarks
These leaderboards are used to track progress in Text Categorization
Libraries
Use these libraries to find Text Categorization models and implementationsMost implemented papers
Discriminating between Similar Languages using Weighted Subword Features
The present contribution revolves around a contrastive subword n-gram model which has been tested in the Discriminating between Similar Languages shared task.
An Automated Text Categorization Framework based on Hyperparameter Optimization
The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution.
Convex Formulation of Multiple Instance Learning from Positive and Unlabeled Bags
Multiple instance learning (MIL) is a variation of traditional supervised learning problems where data (referred to as bags) are composed of sub-elements (referred to as instances) and only bag labels are available.
Authorship Attribution Using the Chaos Game Representation
Validation results for the trained classifiers are competitive with the best methods in prior literature.
Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification
Contrary to the traditional Bag-of-Words approach, we consider the Graph-of-Words(GoW) model in which each document is represented by a graph that encodes relationships between the different terms.
Topic or Style? Exploring the Most Useful Features for Authorship Attribution
Approaches to authorship attribution, the task of identifying the author of a document, are based on analysis of individuals{'} writing style and/or preferred topics.
Document Informed Neural Autoregressive Topic Models
Context information around words helps in determining their actual meaning, for example "networks" used in contexts of artificial neural networks or biological neuron networks.
SeVeN: Augmenting Word Embeddings with Unsupervised Relation Vectors
For example, by examining clusters of relation vectors, we observe that relational similarities can be identified at a more abstract level than with traditional word vector differences.
Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization with Medical Applications
The Tsetlin Machine either performs on par with or outperforms all of the evaluated methods on both the 20 Newsgroups and IMDb datasets, as well as on a non-public clinical dataset.