🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task

Filter by Language (clear)

790 dataset results for Images AND English

Kvasir-Capsule dataset is the largest publicly released VCE dataset. In total, the dataset contains 47,238 labeled images and 117 videos, where it captures anatomical landmarks and pathological and normal findings. The results is more than 4,741,621 images and video frames altogether.

2 PAPERS • NO BENCHMARKS YET

LIS (low-light instance segmentation)

To reveal and systematically investigate the effectiveness of the proposed method in the real world, a real low-light image dataset for instance segmentation is necessary and urgently needed. Considering there is no suitable dataset, therefore, we collect and annotate a Low-light Instance Segmentation (LIS) dataset using a Canon EOS 5D Mark IV camera.

2 PAPERS • NO BENCHMARKS YET

LOFAR RFI Detection

LOFAR RFI Detection (Low-Frequency Array (LOFAR) Radio Frequency Interference Detection)

This dataset contains simulated and expert-labelled spectrograms from two radio telescopes: the Hydrogen Epoch of Reionization Array (HERA) in South Africa and the Low-Frequency Array (LOFAR) in the Netherlands. These datasets are intended to test radio-frequency interference (RFI) detection schemes. This entry pertains to the LOFAR dataset specifically.

2 PAPERS • 1 BENCHMARK

LaSCo

Large Scale Composed Image Retrieval (LaSCo) is a new dataset for Composed Image Retrieval (CoIR), x10 times larger than current ones.

2 PAPERS • 1 BENCHMARK

MSDA (Multi-source domain adaptation dataset for text recognition)

5 domains: synthetic domain, document domain, street view domain, handwritten domain, and car license domain over five million images

2 PAPERS • 2 BENCHMARKS

MarKG

MarKG (Multimodal analogical reasoning Knowledge Graph)

The MarKG dataset has 11,292 entities, 192 relations and 76,424 images, including 2,063 analogy entities and 27 analogy relations. The original intention of MarKG is to provide prior knowledge of analogy entities and relations for better multimodal analogical reasoning.

2 PAPERS • NO BENCHMARKS YET

MatSynth

MatSynth MatSynth is a Physically Based Rendering (PBR) materials dataset designed for modern AI applications. This dataset consists of over 4,000 ultra-high resolution, offering unparalleled scale, diversity, and detail.

2 PAPERS • NO BENCHMARKS YET

Mila Simulated Floods

Mila Simulated Floods Dataset is a 1.5 square km virtual world using the Unity3D game engine including urban, suburban and rural areas.

2 PAPERS • 1 BENCHMARK

NovelCraft

Scene-focused, multi-modal, episodic data of the images and symbolic world-states seen by an agent completing a pogo-stick assembly task within a video game world. Classes consist of episodes with novel objects inserted. A subset of these novel objects can impact gameplay and agent behavior. Novelty objects can vary in size, position, and occlusion within the images. Usable for novelty detection, generalized category discovery, and class-imbalanced classification.

2 PAPERS • NO BENCHMARKS YET

OAD dataset

OAD dataset (The Online Action Detection Dataset)

The Online Action Detection Dataset (OAD) was captured using the Kinect V2 sensor, which collects color images, depth images and human skeleton joints synchronously. This dataset includes 59 long sequences and 10 actions.

2 PAPERS • 1 BENCHMARK

OADAT

OADAT (OADAT: Experimental and Synthetic Clinical Optoacoustic Data for Standardized Image Processing)

An experimental and synthetic (simulated) OA raw signals and reconstructed image domain datasets rendered with different experimental parameters and tomographic acquisition geometries.

2 PAPERS • NO BENCHMARKS YET

Occluded COCO

Occluded COCO is automatically generated subset of COCO val dataset, collecting partially occluded objects for a large variety of categories in real images in a scalable manner, where target object is partially occluded but the segmentation mask is connected.

2 PAPERS • 1 BENCHMARK

OpenLORIS-object

(L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition Dataset (OpenLORIS-Object) is designed for accelerating the lifelong/continual/incremental learning research and application，currently focusing on improving the continuous learning capability of the common objects in the home scenario.

2 PAPERS • NO BENCHMARKS YET

Oracle-MNIST

Oracle-MNIST (Oracle-MNIST: a Realistic Image Dataset for Benchmarking Machine Learning Algorithms)

We introduce the Oracle-MNIST dataset, comprising of 2828 grayscale images of 30,222 ancient characters from 10 categories, for benchmarking pattern classification, with particular challenges on image noise and distortion. The training set totally consists of 27,222 images, and the test set contains 300 images per class. Oracle-MNIST shares the same data format with the original MNIST dataset, allowing for direct compatibility with all existing classifiers and systems, but it constitutes a more challenging classification task than MNIST. The images of ancient characters suffer from 1) extremely serious and unique noises caused by three-thousand years of burial and aging and 2) dramatically variant writing styles by ancient Chinese, which all make them realistic for machine learning research. The dataset is freely available at https://github.com/wm-bupt/oracle-mnist.

2 PAPERS • NO BENCHMARKS YET

PAD Dataset (Pose-agnostic/Multi-pose Anomaly Detection Dataset)

Multi-pose Anomaly Detection (MAD) dataset, which represents the first attempt to evaluate the performance of pose-agnostic anomaly detection. The MAD dataset containing 4,000+ highresolution multi-pose views RGB images with camera/pose information of 20 shape-complexed LEGO animal toys for training, as well as 7,000+ simulation and real-world collected RGB images (without camera/pose information) with pixel-precise ground truth annotations for three types of anomalies in test sets. Note that MAD has been further divided into MAD-Sim and MAD-Real for simulation-to-reality studies to bridge the gap between academic research and the demands of industrial manufacturing.

2 PAPERS • 1 BENCHMARK

PKU (License Plate Detection)

The PKU dataset has almost 4,000 images categorized into five groups (G1-G5) that show different situations. For example, G1 has images of highways during the day with only one car in them. On the other hand, G5 has images of crosswalks during the day or at night with multiple cars and license plates (LPs).

2 PAPERS • NO BENCHMARKS YET

PS-Plant dataset

Automated leaf segmentation is a challenging area in computer vision. Recent advances in machine learning approaches allowed to achieve better results than traditional image processing techniques; however, training such systems often require large annotated data sets. To contribute with annotated data sets and help to overcome this bottleneck in plant phenotyping research, here we provide a novel photometric stereo (PS) data set with annotated leaf masks. This data set forms part of the work done in the BBSRC Tools and Resources Development project BB/N02334X/1.

2 PAPERS • NO BENCHMARKS YET

Pano3D

Pano3D is a new benchmark for depth estimation from spherical panoramas. Its goal is to drive progress for this task in a consistent and holistic manner. The Pano3D 360 depth estimation benchmark provides a standard Matterport3D train and test split, as well as a secondary GibsonV2 partioning for testing and training as well. The latter is used for zero-shot cross dataset transfer performance assessment and decomposes it into 3 different splits, each one focusing on a specific generalization axis.

2 PAPERS • NO BENCHMARKS YET

Paper2Fig100k

Paper2Fig100k is a dataset with over 100k images of figures and texts from research papers. The figures show architecture diagrams and methodologies of articles available at arXiv.org from fields like artificial intelligence and computer vision. Figures usually include text and discrete objects, e.g., boxes in a diagram, with lines and arrows that connect them.

2 PAPERS • NO BENCHMARKS YET

Persuasion Strategies

Modeling what makes an advertisement persuasive, i.e., eliciting the desired response from consumer, is critical to the study of propaganda, social psychology, and marketing. Despite its importance, computational modeling of persuasion in computer vision is still in its infancy, primarily due to the lack of benchmark datasets that can provide persuasion-strategy labels associated with ads. Motivated by persuasion literature in social psychology and marketing, we introduce an extensive vocabulary of persuasion strategies and build the first ad image corpus annotated with persuasion strategies. The dataset also provides image segmentation masks, which labels persuasion strategies in the corresponding ad images on the test split.

2 PAPERS • NO BENCHMARKS YET

RISEdb (Robust Indoor Localization in Complex Scenarios (RISE) database)

The RISE (Robust Indoor Localization in Complex Scenarios) dataset is meant to train and evaluate visual indoor place recognizers. It contains more than 1 million geo-referenced images spread over 30 sequences, covering 5 heterogeneous buildings. For each building we provide: - A high resolution 3D point cloud (1cm) that defines the localization reference frame and that was generated with a mobile laser scanner and an inertial system. - Several image sequences spread over time with accurate ground truth poses retrieved by the laser scanner. Each sequence contains both, stereo pairs and spherical images. - Geo-referenced smartphone data, retrieved from the standard sensors of such devices.

2 PAPERS • NO BENCHMARKS YET

RLU

RLU (RL Unplugged)

RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established. This is a dataset accompanying the paper RL Unplugged: Benchmarks for Offline Reinforcement Learning.

2 PAPERS • NO BENCHMARKS YET

RaidaR

RaidaR (RaidaR: A Rich Annotated Image Dataset of Rainy Street Scenes)

RaidaR is a rich annotated image dataset of rainy street scenes. RaidaR consists of 58,542 real rainy images containing several rain-induced artifacts: fog, droplets, road reflections, etc. 5,000/3,658 images were carefully semantic/instance segmentated, respectively.

2 PAPERS • NO BENCHMARKS YET

ReactionGIF

ReactionGIF is an affective dataset of 30K tweets which can be used for tasks like induced sentiment prediction and multilabel classification of induced emotions.

2 PAPERS • NO BENCHMARKS YET

Rent3D++

Rent3D++ is an extension of the Rent3D floorplans + photos dataset. The floorplans are annotated with room outline polygons, doors/windows as line segments, object-icons as axis-aligned bounding boxes, room-door-room connectivity graphs, and photo-room assignments. We have extracted rectified surface crops from architectural surfaces in photos, and these can drive interior texturing/material modeling tasks. This dataset can be used with our paper Plan2Scene to generate textured 3D mesh models of houses using floorplans and photos.

2 PAPERS • 1 BENCHMARK

SMART-101 (Simple Multimodal Algorithmic Reasoning Task Dataset)

Recent times have witnessed an increasing number of applications of deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, ChatGPT, etc. Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills? To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task (and the associated SMART-101 dataset) for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children of younger age (6--8). Our dataset consists of 101 unique puzzles; each puzzle comprises a picture and a question, and their solution needs a mix of several elementary skills, including pattern recognition, algebra, and spatial reasoning, among others. To train deep neural networks, we programmatically augment each puzzle to 2,000 new instances; each instance varied in appea

2 PAPERS • NO BENCHMARKS YET

SODA-D

SODA-D is a large-scale dataset tailored for small object detection in driving scenario, which is built on top of MVD dataset and owned data, where the former is a dataset dedicated to pixel-level understanding of street scenes, and the latter is mainly captured by onboard cameras and mobile phones. With 24704 well-chosen and high-quality images of driving scenarios, SODA-D comprises 277596 instances of 9 categories with horizontal bounding boxes.

2 PAPERS • 1 BENCHMARK

SYNTH-PEDES

SYNTH-PEDES is a large-scale person dataset with image-text pairs by far, which contains 312,321 identities, 4,791,711 images, and 12,138,157 textual descriptions.

2 PAPERS • NO BENCHMARKS YET

ShapeTalk (The ShapeTalk Dataset)

ShapeTalk contains over half a million discriminative utterances produced by contrasting the shapes of common 3D objects for a variety of object classes and degrees of similarity. The dataset provides discriminative utterances for a total of 36,391 shapes, across 30 object classes. Overall, ShapeTalk contains 73,799 distinct contexts, and a total of 536,596 utterances

2 PAPERS • NO BENCHMARKS YET

SuperCaustics

SuperCaustics is a simulation tool made in Unreal Engine for generating massive computer vision datasets that include transparent objects.

2 PAPERS • 1 BENCHMARK

Synthehicle

Synthehicle is a massive CARLA-based synthehic multi-vehicle multi-camera tracking dataset and includes ground truth for 2D detection and tracking, 3D detection and tracking, depth estimation, and semantic, instance and panoptic segmentation.

2 PAPERS • 1 BENCHMARK

TBBR (Thermal Bridges on Building Rooftops)

The dataset of Thermal Bridges on Building Rooftops (TBBR dataset) consists of annotated combined RGB and thermal drone images with a height map. All images were converted to a uniform format of 3000$\times$4000 pixels, aligned, and cropped to 2400$\times$3400 to remove empty borders.

2 PAPERS • 2 BENCHMARKS

TE141K

A new text effects dataset with 141,081 text effect/glyph pairs in total. The dataset consists of 152 professionally designed text effects rendered on glyphs, including English letters, Chinese characters, and Arabic numerals.

2 PAPERS • NO BENCHMARKS YET

TI1K Dataset (Thumb Index 1000 Hand & Fingertip Detection Dataset)

Thumb Index 1000 (TI1K) is a dataset of 1000 hand images with the hand bounding box, and thumb and index fingertip positions. The dataset includes the natural movement of the thumb and index fingers making it suitable for mixed reality (MR) applications.

2 PAPERS • NO BENCHMARKS YET

TNCR Dataset (Table Net Detection and Classification Dataset)

We present TNCR, a new table dataset with varying image quality collected from free open source websites. TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes.

2 PAPERS • NO BENCHMARKS YET

Tc1 Mouse cerebellum atlas (Tc1 Mouse cerebellum atlas with Purkinje layer segmentation)

This mouse cerebellar atlas can be used for mouse cerebellar morphometry.

2 PAPERS • NO BENCHMARKS YET

Twitter-COMMs

Detecting out-of-context media, such as "mis-captioned" images on Twitter, is a relevant problem, especially in domains of high public significance. Twitter-COMMs is a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles. This dataset can be used to develop methods to detect misinformation on social media platforms related to these three topics.

2 PAPERS • NO BENCHMARKS YET

UBOFAB19 (SVBRDF Database Bonn)

A database of several hundred high quality fabric material measurements, provided as carefully calibrated rectified HDR images, together with SVBRDF fits.

2 PAPERS • NO BENCHMARKS YET

UFPR-ADMR-v1

This dataset contains 2,000 dial meter images obtained on-site by employees of the Energy Company of Paraná (Copel), which serves more than 4 million consuming units in the Brazilian state of Paraná. The images were acquired with many different cameras and are available in the JPG format with 320×640 or 640×320 pixels (depending on the camera orientation).

2 PAPERS • 1 BENCHMARK

UFPR-AMR

This dataset contains 2,000 images taken from inside a warehouse of the Energy Company of Paraná (Copel), which directly serves more than 4 million consuming units in the Brazilian state of Paraná.

2 PAPERS • 1 BENCHMARK

UI5k

UI5k (Mobile App User Interface Dataset)

This dataset contains 54,987 UI screenshots and the metadata from 7,748 Android applications belonging to 25 application categories

2 PAPERS • NO BENCHMARKS YET

VFR-Wild

325 word images intended for font recognition, whose fonts are included in VFR-447 (and VFR-2420).

2 PAPERS • 3 BENCHMARKS

VGGSound-Sparse

The dataset uses VGG-Sound which consists of 10s clips collected from YouTube for 309 sound classes. A subset of ‘temporally sparse’ classes is selected using the following procedure: 5–15 videos are randomly picked from each of the 309 VGGSound classes, and manually annotated as to whether audio-visual cues are only sparsely available. As a result, 12 classes are selected (∼4 %) or 6.5k and 0.6k videos in the train and test sets, respectively. The classes include 'dog barking', 'chopping wood', 'lion roaring', 'skateboarding' etc.

2 PAPERS • NO BENCHMARKS YET

VTC (Videos, Titles and Comments)

VTC is a large-scale multimodal dataset containing video-caption pairs (~300k) alongside comments that can be used for multimodal representation learning.

2 PAPERS • NO BENCHMARKS YET

WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

WHOOPS! Is a dataset and benchmark for visual commonsense. The dataset is comprised of purposefully commonsense-defying images created by designers using publicly-available image generation tools like Midjourney. It contains commonsense-defying image from a wide range of reasons, deviations from expected social norms and everyday knowledge.

2 PAPERS • 4 BENCHMARKS

WebLINX (Real-World Website Navigation with Multi-Turn)

WebLINX is a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. It covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios.

2 PAPERS • 1 BENCHMARK

WebVid-CoVR

The WebVid-CoVR dataset is a collection of video-text-video triplets that can be used for the task of composed video retrieval (CoVR). CoVR is a task that involves searching for videos that match both a query image and a query text. The text typically specifies the desired modification to the query image.

2 PAPERS • 1 BENCHMARK

Wind Tunnel and Flight Test Experiments

Our dataset comprises $23.468$ non-labelled and $356$ labelled samples where each sample is $512 \times 512 \times 1$ dimensional IR image collected with the thermographic measurement specifications. Some samples contain scars, shadows, salt \& pepper noises and contrast burst regions, demonstrating that realistic laminar-turbulent flow observation scenarios are subject to high noise. Besides, a laminar flow area may occur brighter or darker as compared to the regions in a turbulent flow. Due to some effect (e.g. shadowing the sun) it is even possible that, in one part of the image, the laminar flow area appears darker, and in another part, it appears brighter than the turbulent flow area.

2 PAPERS • 1 BENCHMARK

Datasets

790 dataset results for Images AND English