IndustReal is an ego-centric, multi-modal dataset where 27 participants are challenged to perform assembly and maintenance procedures on a construction-toy car. The dataset is annotated for action recognition, assembly state detection, and procedure step recognition. IndustReal includes 38 execution errors in a total of 84 videos, with 14 exclusive to validation and test sets and therefore suitable for testing the robustness of algorithms against unseen errors in procedural tasks. IndustReal offers open-source 3D models for all parts to promote the use of synthetic data for scalable approaches on this dataset, as well as reproducibility. All assembly parts used in the dataset are 3D printed. This ensures reproducibility and future availability of the model and allows for growth via community effort.
2 PAPERS • 2 BENCHMARKS
Existing image/video datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. This limits the utility of machine learning (ML) models learned from them. Therefore, we introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle. By creating and sharing CVB, our aim is to develop improved models capable of recognizing all important cattle behaviors accurately and to assist other researchers and practitioners in developing and evaluating new ML models for cattle behavior classification using video data. The dataset is presented in the form of following three sub-directories. 1. raw_frames: contains 450 frames in each sub folder representing a 15 second video taken at a frame rate of 30 FPS. 2. annotations: contains the json file
1 PAPER • NO BENCHMARKS YET
Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry. To enable technological breakthrough, we present HA-ViD – an assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view and multi-modality videos, 1.5M frames, 96K temporal labels and 2M spatial labels. We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking. Importantly, we analyze their performance and the further reasoning steps for comprehending knowledge in assembly progress, process effici