Text Clustering

31 papers with code • 3 benchmarks • 5 datasets

Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)

Benchmarks

Add a Result

These leaderboards are used to track progress in Text Clustering

Dataset	Best Model	Compare
MTEB	ST5-XXL	See all
20 Newsgroups	G-BAT	See all
Urdu News Headlines Dataset	Vector Space Model	See all

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Training Effective Neural Sentence Encoders from Automatically Mined Paraphrases

sdadas/polish-sentence-evaluation • • 26 Jul 2022

Our sentence encoder can be trained in less than a day on a single graphics card, achieving high performance on a diverse set of sentence-level tasks.

Paper
Code

Very Large Language Model as a Unified Methodology of Text Mining

jonjoncardoso/jonjoncardoso • 19 Dec 2022

Text data mining is the process of deriving essential information from language text.

Paper
Code

DeepLens: Interactive Out-of-distribution Data Detection in NLP Models

momentum-lab-workspace/deeplens • • 2 Mar 2023

In this work, we propose DeepLens, an interactive system that helps users detect and explore OOD issues in massive text corpora.

Paper
Code

Influence of various text embeddings on clustering performance in NLP

simpleparadox/cmput_697_project • • 4 May 2023

For example, a three star rating (out of five) may be incongruous with the review text, which may be more suitable for a five star review.

Paper
Code

Robust Representation Learning with Reliable Pseudo-labels Generation via Self-Adaptive Optimal Transport for Short Text Clustering

hmllmh/rstc • • 23 May 2023

To tackle the above issues, we propose a Robust Short Text Clustering (RSTC) model to improve robustness against imbalanced and noisy data.

Paper
Code

ClusterLLM: Large Language Models as a Guide for Text Clustering

zhang-yu-wei/clusterllm • • 24 May 2023

First, we prompt ChatGPT for insights on clustering perspective by constructing hard triplet questions <does A better correspond to B than C>, where A, B and C are similar data points that belong to different clusters according to small embedder.

Paper
Code

Large Language Models Enable Few-Shot Clustering

viswavi/few-shot-clustering • • 2 Jul 2023

In this paper, we ask whether a large language model can amplify an expert's guidance to enable query-efficient, few-shot semi-supervised text clustering.

Paper
Code

More Discriminative Sentence Embeddings via Semantic Graph Smoothing

chakib401/smoothing_sentence_embeddings • • 20 Feb 2024

This paper explores an empirical approach to learn more discriminantive sentence representations in an unsupervised fashion.

Paper
Code

Text Clustering

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result