e-ViL is a benchmark for explainable vision-language tasks. e-ViL spans across three datasets of human-written NLEs (natural language explanations), and provides a unified evaluation framework that is designed to be re-usable for future works.
This benchmark uses the following datasets: e-SNLI-VE, VCR, VQA-X.
Paper | Code | Results | Date | Stars |
---|