TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Sentiment Analysis	IMDb	Llama-2-70b-chat (0-shot)	Accuracy	95.39	# 16
Sentiment Analysis	IMDb	RoBERTa-large with LlamBERT	Accuracy	96.68	# 1
Sentiment Analysis	IMDb	RoBERTa-large	Accuracy	96.54	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llambert-large-scale-low-cost-data-annotation/sentiment-analysis-on-imdb)](https://paperswithcode.com/sota/sentiment-analysis-on-imdb?p=llambert-large-scale-low-cost-data-annotation)`

LlamBERT: Large-scale low-cost data annotation in NLP

23 Mar 2024 · Bálint Csanády, Lajos Muzsai, Péter Vedres, Zoltán Nádasdy, András Lukács ·

Large Language Models (LLMs), such as GPT-4 and Llama 2, show remarkable proficiency in a wide range of natural language processing (NLP) tasks. Despite their effectiveness, the high costs associated with their use pose a challenge. We present LlamBERT, a hybrid approach that leverages LLMs to annotate a small subset of large, unlabeled databases and uses the results for fine-tuning transformer encoders like BERT and RoBERTa. This strategy is evaluated on two diverse datasets: the IMDb review dataset and the UMLS Meta-Thesaurus. Our results indicate that the LlamBERT approach slightly compromises on accuracy while offering much greater cost-effectiveness.

PDF Abstract

Code

Add Remove Mark official

aielte-research/llambert official

Tasks

Add Remove

Sentiment Analysis

Text Classification

Datasets

IMDb Movie Reviews

Results from the Paper

Add Remove

Ranked #1 on Sentiment Analysis on IMDb (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Sentiment Analysis	IMDb	Llama-2-70b-chat (0-shot)	Accuracy	95.39	# 16	Compare
Sentiment Analysis	IMDb	RoBERTa-large with LlamBERT	Accuracy	96.68	# 1	Compare
Sentiment Analysis	IMDb	RoBERTa-large	Accuracy	96.54	# 2	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • Attention Dropout • BERT • BPE • Dense Connections • Dropout • GELU • GPT-4 • Label Smoothing • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • LLaMA • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • RoBERTa • Scaled Dot-Product Attention • Softmax • Transformer • Weight Decay • WordPiece

Edit Social Preview

LlamBERT: Large-scale low-cost data annotation in NLP

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove