e-ViL is a benchmark for explainable vision-language tasks. e-ViL spans across three datasets of human-written NLEs (natural language explanations), and provides a unified evaluation framework that is designed to be re-usable for future works.

This benchmark uses the following datasets: e-SNLI-VE, VCR, VQA-X.

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


Similar Datasets


License


  • Multiple licenses

Modalities


Languages