Text Clustering

31 papers with code • 3 benchmarks • 5 datasets

Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)

Most implemented papers

Training Effective Neural Sentence Encoders from Automatically Mined Paraphrases

sdadas/polish-sentence-evaluation 26 Jul 2022

Our sentence encoder can be trained in less than a day on a single graphics card, achieving high performance on a diverse set of sentence-level tasks.

Very Large Language Model as a Unified Methodology of Text Mining

jonjoncardoso/jonjoncardoso 19 Dec 2022

Text data mining is the process of deriving essential information from language text.

DeepLens: Interactive Out-of-distribution Data Detection in NLP Models

momentum-lab-workspace/deeplens 2 Mar 2023

In this work, we propose DeepLens, an interactive system that helps users detect and explore OOD issues in massive text corpora.

Influence of various text embeddings on clustering performance in NLP

simpleparadox/cmput_697_project 4 May 2023

For example, a three star rating (out of five) may be incongruous with the review text, which may be more suitable for a five star review.

Robust Representation Learning with Reliable Pseudo-labels Generation via Self-Adaptive Optimal Transport for Short Text Clustering

hmllmh/rstc 23 May 2023

To tackle the above issues, we propose a Robust Short Text Clustering (RSTC) model to improve robustness against imbalanced and noisy data.

ClusterLLM: Large Language Models as a Guide for Text Clustering

zhang-yu-wei/clusterllm 24 May 2023

First, we prompt ChatGPT for insights on clustering perspective by constructing hard triplet questions <does A better correspond to B than C>, where A, B and C are similar data points that belong to different clusters according to small embedder.

Large Language Models Enable Few-Shot Clustering

viswavi/few-shot-clustering 2 Jul 2023

In this paper, we ask whether a large language model can amplify an expert's guidance to enable query-efficient, few-shot semi-supervised text clustering.

More Discriminative Sentence Embeddings via Semantic Graph Smoothing

chakib401/smoothing_sentence_embeddings 20 Feb 2024

This paper explores an empirical approach to learn more discriminantive sentence representations in an unsupervised fashion.