The Multi-Object and Segmentation (MOTS) benchmark [2] consists of 21 training sequences and 29 test sequences. It is based on the KITTI Tracking Evaluation 2012 and extends the annotations to the Multi-Object and Segmentation (MOTS) task. To this end, we added dense pixel-wise segmentation labels for every object. We evaluate submitted results using the metrics HOTA, CLEAR MOT, and MT/PT/ML. We rank methods by HOTA [1]. Our development kit and GitHub evaluation code provide details about the data format as well as utility functions for reading and writing the label files. (adapted for the segmentation case). Evaluation is performed using the code from the TrackEval repository.
26 PAPERS • 1 BENCHMARK
The evaluation of object detection models is usually performed by optimizing a single metric, e.g. mAP, on a fixed set of datasets, e.g. Microsoft COCO and Pascal VOC. Due to image retrieval and annotation costs, these datasets consist largely of images found on the web and do not represent many real-life domains that are being modelled in practice, e.g. satellite, microscopic and gaming, making it difficult to assert the degree of generalization learned by the model.
4 PAPERS • 1 BENCHMARK
Existing image/video datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. This limits the utility of machine learning (ML) models learned from them. Therefore, we introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle. By creating and sharing CVB, our aim is to develop improved models capable of recognizing all important cattle behaviors accurately and to assist other researchers and practitioners in developing and evaluating new ML models for cattle behavior classification using video data. The dataset is presented in the form of following three sub-directories. 1. raw_frames: contains 450 frames in each sub folder representing a 15 second video taken at a frame rate of 30 FPS. 2. annotations: contains the json file
1 PAPER • NO BENCHMARKS YET
In this dataset an uppertorso humanoid robot with 7-DOF arm explored 100 different objects belonging to 20 different categories using 10 behaviors: Look, Crush, Grasp, Hold, Lift, Drop, Poke, Push, Shake and Tap.
Our dataset augments the TAO dataset with amodal bounding box annotations for fully invisible, out-of-frame, and occluded objects. Note that this implies TAO-Amodal also includes modal segmentation masks (as visualized in the color overlays above). Our dataset encompasses 880 categories, aimed at assessing the occlusion reasoning capabilities of current trackers through the paradigm of Tracking Any Object with Amodal perception (TAO-Amodal).
The data was captured from an overhead perspective, showcasing the swimming behavior of fish in a simulated flowing water channel. This angle provides a panoramic view from above to observe the water channel and the fish behavior. It enables researchers to better observe and analyze fish swimming patterns, group behavior, and their adaptive abilities to water dynamics. Moreover, the overhead perspective offers more accurate spatial positioning and motion tracking, providing valuable data for studying fish behavior and ecology. By observing and analyzing this data, a deeper understanding of fish ecological adaptability, migration patterns, and interactions with environmental factors in simulated flowing water channels can be gained. This knowledge serves as a scientific basis and decision support for areas such as aquaculture, ecological conservation, and hydraulic research. E-mail: peifei122@gmail.com
0 PAPER • NO BENCHMARKS YET