CHOCOLATE (Captions Have Often ChOsen Lies About The Evidence)

Introduced by Huang et al. in Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

CHOCOLATE is a benchmark for detecting and correcting factual inconsistency in generated chart captions. It consists of captions produced by six advanced models, which are categorized into three subsets:

LVLM: GPT-4V, Bard (before Gemini)
LLM-based Pipeline: DePlot + GPT-4
Fine-tuned Model: ChartT5, MatCha, UniChart

The charts are from two datasets: VisText and the Pew split of Chart-to-Text. In total, CHOCOLATE consists of 1,187 examples. Each instance in CHOCOLATE consists of a caption generated by one of the models and the annotations of the factual errors for each caption sentence.

Paper Information

Paper: https://arxiv.org/abs/2312.10160
Code: https://github.com/khuangaf/CHOCOLATE/
Project: https://khuangaf.github.io/CHOCOLATE

Citation

If you use the CHOCOLATE dataset in your work, please kindly cite the paper using this BibTeX:

@misc{huang-etal-2023-do,
    title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning",
    author = "Huang, Kung-Hsiang  and
      Zhou, Mingyang and
      Chan, Hou Pong  and
      Fung, Yi R. and
      Wang, Zhenhailong and
      Zhang, Lingyu and
      Chang, Shih-Fu and
      Ji, Heng",
    year={2023},
    eprint={2312.10160},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Factual Inconsistency Detection in Chart Captioning	CHOCOLATE-LLM	GPT-4V
Factual Inconsistency Detection in Chart Captioning	CHOCOLATE-FT	Bard
Factual Inconsistency Detection in Chart Captioning	CHOCOLATE-LVLM	ChartVE
Factual Inconsistency Detection in Chart Captioning	CHOCOLATE	ChartVE