Certainly! FinanceBench is a groundbreaking benchmark designed for evaluating the performance of large language models (LLMs) in the domain of financial question answering (QA). Here are the key details about FinanceBench:

  1. What is FinanceBench?
  2. FinanceBench is a first-of-its-kind test suite specifically tailored for assessing LLMs' capabilities in answering financial questions.
  3. It focuses on open book financial QA and comprises a collection of 10,231 questions related to publicly traded companies.
  4. Each question comes with corresponding answers and evidence strings.

  5. Why FinanceBench Matters:

  6. The questions in FinanceBench are ecologically valid, covering a diverse set of scenarios.
  7. They are intentionally designed to be clear-cut and straightforward, serving as a minimum performance standard.
  8. FinanceBench aims to evaluate how well LLMs handle financial queries, especially those related to publicly traded companies.

  9. Model Evaluation:

  10. Researchers tested 16 state-of-the-art model configurations, including GPT-4-Turbo, Llama2, and Claude2.
  11. The evaluation involved a sample of 150 cases from FinanceBench, with manual review of answers (totaling 2,400).
  12. Notably, existing LLMs have limitations for financial QA. For instance:
    • GPT-4-Turbo, when used with a retrieval system, incorrectly answered or refused to answer 81% of questions.
    • Augmentation techniques (such as longer context windows) improved performance but are unrealistic for enterprise settings due to increased latency and inability to handle larger financial documents.
  13. All examined models exhibit weaknesses, such as hallucinations, which limit their suitability for enterprise use.

  14. Availability:

  15. The FinanceBench cases are available open-source for further exploration and research.

¹: Islam, P., Kannappan, A., Kiela, D., Qian, R., Scherrer, N., & Vidgen, B. (2023). FinanceBench: A New Benchmark for Financial Question Answering. arXiv preprint arXiv:2311.11944. ²: Link to the official paper ³: Papers with Code - FinanceBench

Source: Conversation with Bing, 3/16/2024 (1) Papers with Code - FinanceBench: A New Benchmark for Financial Question .... https://paperswithcode.com/paper/financebench-a-new-benchmark-for-financial. (2) FinanceBench: A New Benchmark for Financial Question Answering. https://arxiv.org/abs/2311.11944. (3) Papers with Code - Paper tables with annotated results for FinanceBench .... https://paperswithcode.com/paper/financebench-a-new-benchmark-for-financial/review/.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages