Pathfinder and Pathfinder-X have proven to be instrumental in training and testing Large Language Models with long-range dependencies. Recently, Meta's Moving Average Equipped Gated Attention model scored a 97% on the Pathfinder-X dataset, indicating a need for a larger, more challenging dataset. Whereas Pathfinder-X only went up to 256 x 256 pixel images (or a sequence length of 65,536 tokens), Pathfinder-X2 introduces images of 512 x 512 pixels, or 262,144 tokens.
Each image is meant to be read as a sequence of pixels. A LLM's task is to segment out the one snake in each image with a circle at its tip. The dataset includes 200,000 images and 200,000 segmentation masks, one for each image.
Paper | Code | Results | Date | Stars |
---|