Text Clustering
31 papers with code • 3 benchmarks • 5 datasets
Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)
Datasets
Most implemented papers
Training Effective Neural Sentence Encoders from Automatically Mined Paraphrases
Our sentence encoder can be trained in less than a day on a single graphics card, achieving high performance on a diverse set of sentence-level tasks.
Very Large Language Model as a Unified Methodology of Text Mining
Text data mining is the process of deriving essential information from language text.
DeepLens: Interactive Out-of-distribution Data Detection in NLP Models
In this work, we propose DeepLens, an interactive system that helps users detect and explore OOD issues in massive text corpora.
Influence of various text embeddings on clustering performance in NLP
For example, a three star rating (out of five) may be incongruous with the review text, which may be more suitable for a five star review.
Robust Representation Learning with Reliable Pseudo-labels Generation via Self-Adaptive Optimal Transport for Short Text Clustering
To tackle the above issues, we propose a Robust Short Text Clustering (RSTC) model to improve robustness against imbalanced and noisy data.
ClusterLLM: Large Language Models as a Guide for Text Clustering
First, we prompt ChatGPT for insights on clustering perspective by constructing hard triplet questions <does A better correspond to B than C>, where A, B and C are similar data points that belong to different clusters according to small embedder.
Large Language Models Enable Few-Shot Clustering
In this paper, we ask whether a large language model can amplify an expert's guidance to enable query-efficient, few-shot semi-supervised text clustering.
More Discriminative Sentence Embeddings via Semantic Graph Smoothing
This paper explores an empirical approach to learn more discriminantive sentence representations in an unsupervised fashion.