TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	ConvFinQA	General Crowd	Execution Accuracy	46.90	# 3
Question Answering	ConvFinQA	GPT-4 (8k)	Execution Accuracy	76.48	# 1
Question Answering	FinQA	GPT-4 (8k)	Execution Accuracy	68.79	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/are-chatgpt-and-gpt-4-general-purpose-solvers/question-answering-on-convfinqa)](https://paperswithcode.com/sota/question-answering-on-convfinqa?p=are-chatgpt-and-gpt-4-general-purpose-solvers)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/are-chatgpt-and-gpt-4-general-purpose-solvers/question-answering-on-finqa)](https://paperswithcode.com/sota/question-answering-on-finqa?p=are-chatgpt-and-gpt-4-general-purpose-solvers)`

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks

10 May 2023 · Xianzhi Li, Samuel Chan, Xiaodan Zhu, Yulong Pei, Zhiqiang Ma, Xiaomo Liu, Sameena Shah ·

The most recent large language models(LLMs) such as ChatGPT and GPT-4 have shown exceptional capabilities of generalist models, achieving state-of-the-art performance on a wide range of NLP tasks with little or no adaptation. How effective are such models in the financial domain? Understanding this basic question would have a significant impact on many downstream financial analytical tasks. In this paper, we conduct an empirical study and provide experimental evidences of their performance on a wide variety of financial text analytical problems, using eight benchmark datasets from five categories of tasks. We report both the strengths and limitations of the current models by comparing them to the state-of-the-art fine-tuned approaches and the recently released domain-specific pretrained models. We hope our study can help understand the capability of the existing models in the financial domain and facilitate further improvements.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Binary Classification

named-entity-recognition

Named Entity Recognition

Named Entity Recognition (NER)

NER

Question Answering

Sentiment Analysis

Text Classification

Datasets

FinQA

ConvFinQA

Results from the Paper

Edit

Ranked #1 on Question Answering on ConvFinQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	ConvFinQA	General Crowd	Execution Accuracy	46.90	# 3	Compare
Question Answering	ConvFinQA	GPT-4 (8k)	Execution Accuracy	76.48	# 1	Compare
Question Answering	FinQA	GPT-4 (8k)	Execution Accuracy	68.79	# 3	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • GPT-4 • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove