Text Classification

1090 papers with code • 150 benchmarks • 147 datasets

Text Classification is the task of assigning a sentence or document an appropriate category. The categories depend on the chosen dataset and can range from topics.

Text Classification problems include emotion classification, news classification, citation intent classification, among others. Benchmark datasets for evaluating text classification capabilities include GLUE, AGNews, among others.

In recent years, deep learning techniques like XLNet and RoBERTa have attained some of the biggest performance jumps for text classification problems.

( Image credit: Text Classification Algorithms: A Survey )

Libraries

Use these libraries to find Text Classification models and implementations

Latest papers with no code

Language Models for Text Classification: Is In-Context Learning Enough?

no code yet • 26 Mar 2024

This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.

LARA: Linguistic-Adaptive Retrieval-Augmented LLMs for Multi-Turn Intent Classification

no code yet • 25 Mar 2024

Following the significant achievements of large language models (LLMs), researchers have employed in-context learning for text classification tasks.

On the Fragility of Active Learners

no code yet • 23 Mar 2024

The impact of this study is in its insights for a practitioner: (a) the choice of text representation and classifier is as important as that of an AL technique, (b) choice of the right metric is critical in assessment of the latter, and, finally, (c) reported AL results must be holistically interpreted, accounting for variables other than just the query strategy.

VLUE: A New Benchmark and Multi-task Knowledge Transfer Learning for Vietnamese Natural Language Understanding

no code yet • 23 Mar 2024

The success of Natural Language Understanding (NLU) benchmarks in various languages, such as GLUE for English, CLUE for Chinese, KLUE for Korean, and IndoNLU for Indonesian, has facilitated the evaluation of new NLU models across a wide range of tasks.

MasonTigers at SemEval-2024 Task 8: Performance Analysis of Transformer-based Models on Machine-Generated Text Detection

no code yet • 22 Mar 2024

This paper presents the MasonTigers entry to the SemEval-2024 Task 8 - Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection.

Multi-Level Explanations for Generative Language Models

no code yet • 21 Mar 2024

To address the challenges of text as output and long text inputs, we propose a general framework called MExGen that can be instantiated with different attribution algorithms.

Visual Analytics for Fine-grained Text Classification Models and Datasets

no code yet • 21 Mar 2024

As a consequence, the semantic structures of datasets have become more complex, and model decisions more difficult to explain.

Vi-Mistral-X: Building a Vietnamese Language Model with Advanced Continual Pre-training

no code yet • 20 Mar 2024

To address this issue, this paper presents vi-mistral-x, an innovative Large Language Model designed expressly for the Vietnamese language.

Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service

no code yet • 19 Mar 2024

Using the best hack found, we then compare 512, 256, and 128 tokens length.

CrossTune: Black-Box Few-Shot Classification with Label Enhancement

no code yet • 19 Mar 2024

Training or finetuning large-scale language models (LLMs) requires substantial computation resources, motivating recent efforts to explore parameter-efficient adaptation to downstream tasks.