ESP dataset (Evaluation for Styled Prompt dataset) is a benchmark for zero-shot domain-conditional caption generation. ESP is a new dataset focusing on providing multiple styled text targets for the same image. It comprises 4.8k captions from 1k images in the COCO Captions test set. We collect five text domains with everyday usage: blog, social media, instruction, story, and news.
Paper | Code | Results | Date | Stars |
---|