58 dataset results for Sentiment Analysis AND Texts

LSICC (Large Scale Informal Chinese Corpus)

Large Scale Informal Chinese Corpus (LSICC) is a large-scale corpus of informal Chinese. This corpus contains around 37 million book reviews and 50 thousand netizen's comments to the news.

1 PAPER • NO BENCHMARKS YET

MalayalamMixSentiment

MalayalamMixSentiment is a Sentiment Analysis Dataset for Code-Mixed Malayalam-English.

1 PAPER • NO BENCHMARKS YET

Modern Hebrew Sentiment Dataset

Modern Hebrew Sentiment Dataset is a sentiment analysis benchmark for Hebrew, based on 12K social media comments, and provide two instances of these data: in token-based and morpheme-based settings.

1 PAPER • NO BENCHMARKS YET

SAIL 2017

SAIL 2017 (Sentiment Analysis for Indian Languages)

India is a linguistic area with one of the longest histories of contact, influence, use, teaching and learning of English-in-diaspora in the world (Kachru and Nelson, 2006). Thus, a huge number of Indians active on the internet are able in English communication to some degree. India also enjoys huge diversity in language. Apart from Hindi, it has several regional languages that are the primary tongue of people native to the region. This is to the extent that social media including Facebook, WhatsApp, Twitter, etc. contain more than one language, and such phenomena are called code-mixing and code-switching. On the other side, the evolution of sentiments from such social media texts have also created many new opportunities for information access and language technology, but also many new challenges, making it one of the prime present-day research areas. Sentiment analysis in code-mixed data has several real-life applications in opinion mining from social media campaign to feedback analys

1 PAPER • 1 BENCHMARK

SILICONE Benchmark

SILICONE Benchmark (SILICONE)

The Sequence labellIng evaLuatIon benChmark fOr spoken laNguagE (SILICONE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems specifically designed for spoken language. All datasets are in the English language and covers a large variety of domains (e.g daily life, scripted scenarios, joint task completion, phone call conversations, and televsion dialogue). Some datasets additionally include emotion and/or sentiment labels.

1 PAPER • 1 BENCHMARK

SVLD (Social Vision and Language Dataset)

The social vision and language dataset is a large-scale multimodal dataset designed for research into social contextual learning.

1 PAPER • NO BENCHMARKS YET

SentimentArcs: Sentiment Reference Corpus for Novels

SentimentArcs’ reference corpus for novels consists of 25 narratives selected to create a diverse set of well recognized novels that can serve as a benchmark for future studies. The composition of the corpora was limited by the effect of copyright laws as well as historical imbalances. Most works were obtained from US and Australian Gutenberg Projects. The corpora is expected to grow in size and diversity over time.

1 PAPER • NO BENCHMARKS YET

TBCOV

TBCOV is a large-scale Twitter dataset comprising more than two billion multilingual tweets related to the COVID-19 pandemic collected worldwide over a continuous period of more than one year. Several state-of-the-art deep learning models are used to enrich the data with important attributes, including sentiment labels, named-entities (e.g., mentions of persons, organizations, locations), user types, and gender information. A geotagging method is proposed to assign country, state, county, and city information to tweets, enabling a myriad of data analysis tasks to understand real-world issues.

1 PAPER • NO BENCHMARKS YET

Twitter US Airline Sentiment

A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service"). You can download the non-aggregated results (55,000 rows) here.

1 PAPER • NO BENCHMARKS YET

SEN

SEN (Sentiment analysis of Entities in News headlines)

SEN is a novel publicly available human-labelled dataset for training and testing machine learning algorithms for the problem of entity level sentiment analysis of political news headlines.

0 PAPER • NO BENCHMARKS YET

Datasets

58 dataset results for Sentiment Analysis AND Texts