ISEKAI

Introduced by Tai et al. in Link-Context Learning for Multimodal LLMs

ISEKAI dataset’s images are generated by Midjourney’s text-to-image model using well-crafted instructions. Images were manually selected to ensure core concept consistency. The dataset currently comprises 20 groups, and 40 categories in total (continues to grow). Each group pairs a new concept with a related real-world concept, like "octopus vacuum" and "octopus." These can serve as challenging negative samples for each other. Each concept has no less than 32 images, supporting multi-shot examples.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


Modalities


Languages