AnglE-optimized Text Embeddings

22 Sep 2023  ยท  Xianming Li, Jing Li ยท

High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Sentiment Analysis CR AnglE-LLaMA-7B Accuracy 93.54 # 1
Sentiment Analysis MR AnglE-LLaMA-7B Accuracy 91.09 # 3
Semantic Textual Similarity MTEB AnglE-UAE Spearman Correlation 84.54 # 1
Semantic Textual Similarity SICK-R AnglE-LLaMA-7B Spearman Correlation 0.8118 # 1
Semantic Textual Similarity SICK-R AnglE-LLaMA-13B Spearman Correlation 0. 8132 # 3
Semantic Textual Similarity SICK-R AnglE-LLaMA-7B-v2 Spearman Correlation 0.8094 # 2
Semantic Textual Similarity STS12 AnglE-LLaMA-7B Spearman Correlation 0.7868 # 7
Semantic Textual Similarity STS12 AnglE-LLaMA-13B Spearman Correlation 0.7933 # 5
Semantic Textual Similarity STS12 AnglE-LLaMA-7B-v2 Spearman Correlation 0.7900 # 6
Semantic Textual Similarity STS13 AnglE-LLaMA-13B Spearman Correlation 0.9065 # 1
Semantic Textual Similarity STS13 AnglE-LLaMA-7B Spearman Correlation 0.9058 # 2
Semantic Textual Similarity STS13 AnglE-LLaMA-7B-v2 Spearman Correlation 0.9056 # 3
Semantic Textual Similarity STS14 AnglE-LLaMA-7B-v2 Spearman Correlation 0.8579 # 3
Semantic Textual Similarity STS14 AnglE-LLaMA-13B Spearman Correlation 0.8689 # 1
Semantic Textual Similarity STS14 AnglE-LLaMA-7B Spearman Correlation 0.8549 # 4
Semantic Textual Similarity STS15 AnglE-LLaMA-13B Spearman Correlation 0.9045 # 1
Semantic Textual Similarity STS15 AnglE-LLaMA-7B-v2 Spearman Correlation 0.8943 # 6
Semantic Textual Similarity STS15 AnglE-LLaMA-7B Spearman Correlation 0.8956 # 3
Semantic Textual Similarity STS16 AnglE-LLaMA-13B Spearman Correlation 0.8732 # 1
Semantic Textual Similarity STS16 AnglE-LLaMA-7B-v2 Spearman Correlation 0.8700 # 2
Semantic Textual Similarity STS16 AnglE-LLaMA-7B Spearman Correlation 0.8691 # 3
Semantic Textual Similarity STS Benchmark AnglE-LLaMA-7B Spearman Correlation 0.8892 # 11
Semantic Textual Similarity STS Benchmark AnglE-LLaMA-7B-v2 Spearman Correlation 0.8897 # 10
Semantic Textual Similarity STS Benchmark AnglE-LLaMA-13B Spearman Correlation 0.8969 # 7

Methods


No methods listed for this paper. Add relevant methods here