This repository contains a financial-domain-focused dataset for financial sentiment/emotion classification and stock market time series prediction. It's based on our paper: StockEmotions: Discover Investor Emotions for Financial Sentiment Analysis and Multivariate Time Series accepted by AAAI 2023 Bridge (AI for Financial Services).

  • Data collection period: Jan 2020 - Dec 2020
  • Number of Utterance: 10,000 (train 80%, val 10%, test 10%)
  • Sentiment classes: 2 [bullish (~positive), bearish (~negative)]
  • Emotion classes: 12 [ambiguous, amusement, anger, anxiety, belief, confusion, depression, disgust, excitement, optimism, panic, surprise]

  • tweet/processed.csv: 50,281 samples with text-processed data for Topic Modelling

  • tweet/train, val, test.csv: 10,000 samples in total. Each file has id, date, ticker, emo_label, senti_lable, original, and processed content. For the data curation, processing (e.g. emoji, CTAG, HTAG), and annotation, we refer to our paper. The dataset is used for Financial Sentiment/Emotion Classification tasks.
  • price/38 companies: historical price data in csv format. The tweet and price dataset together are used for Multivariate Time Series tasks.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages