COESOT

In this work, we propose a general dataset for Color-Event camera based Single Object Tracking, termed COESOT. It contains 1354 color-event videos with 478,721 RGB frames. We split them into a training and testing subset, which contains 827 and 527 videos, respectively. The videos are collected from both outdoor and indoor scenarios (such as the street, zoo, and home) using the DAVIS346 event camera with a zoom lens. Therefore, our videos can reflect the variation in the distance at depth, but other datasets are failed to. Different from existing benchmarks which contain limited categories, our proposed COESOT covers a wider range of object categories (90 classes), as shown in Fig. 3 (a). It mainly reflects four groups, including persons, animals, electronics, and other goods.

The ground truth of the proposed COESOT dataset is densely annotated, i.e., in a frame-by frame way. The absent label is also provided to help researchers design their trackers. Inspired by VisEvent [44], we annotate each testing video sequence with 17 attributes to help researchers evaluate their trackers in specific challenging environments, e.g., full occlusion (FOC), deformation (DEF), rotation (ROT), fast motion (FM), partially occlusion (POC), low illumination (LI), scale variation (SV), background object motion (BOM), motion blur (MB), overexposure (OE), etc. The distribution of videos in each attribute is shown in Fig. 3 (b). The statistical distribution of the ground truth center position is shown in Fig 3 (c). More details can be found in our supplementary materials.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages