STS
103 papers with code • 0 benchmarks • 4 datasets
Benchmarks
These leaderboards are used to track progress in STS
Libraries
Use these libraries to find STS models and implementationsMost implemented papers
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.
SimCSE: Simple Contrastive Learning of Sentence Embeddings
This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings.
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
Learning sentence embeddings often requires a large amount of labeled data.
MedSTS: A Resource for Clinical Semantic Textual Similarity
A subset of MedSTS (MedSTS_ann) containing 1, 068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0-5 (low to high similarity).
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation
Semantic Textual Similarity (STS) measures the meaning similarity of sentences.
KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language.
Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors
Recent literature suggests that averaged word vectors followed by simple post-processing outperform many deep learning methods on semantic textual similarity tasks.
FFCI: A Framework for Interpretable Automatic Evaluation of Summarization
In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).
PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT
This study provides an efficient approach for using text data to calculate patent-to-patent (p2p) technological similarity, and presents a hybrid framework for leveraging the resulting p2p similarity for applications such as semantic search and automated patent classification.
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
To support our investigation, we establish a new sentence representation transfer benchmark, SentGLUE, which extends the SentEval toolkit to nine tasks from the GLUE benchmark.