ISEKAI dataset’s images are generated by Midjourney’s text-to-image model using well-crafted instructions. Images were manually selected to ensure core concept consistency. The dataset currently comprises 20 groups, and 40 categories in total (continues to grow). Each group pairs a new concept with a related real-world concept, like "octopus vacuum" and "octopus." These can serve as challenging negative samples for each other. Each concept has no less than 32 images, supporting multi-shot examples.
Paper | Code | Results | Date | Stars |
---|