The benchmarks section lists all benchmarks using a given dataset or any of
its variants. We use variants to distinguish between results evaluated on
slightly different versions of the same dataset. For example, ImageNet 32⨉32
and ImageNet 64⨉64 are variants of the ImageNet dataset.
CMU Panoptic is a large scale dataset providing 3D pose annotations (1.5 millions) for multiple people engaging social activities. It contains 65 videos (5.5 hours) with multi-view annotations, but only 17 of them are in multi-person scenario and have the camera parameters.
Massively Multiview System
480 VGA camera views
30+ HD views
10 RGB-D sensors
Hardware-based sync
Calibration
Interesting Scenes with Labels
Multiple people
Socially interacting groups
3D body pose
3D facial landmarks
Transcripts + speaker ID
Hardware setup
480 VGA cameras, 640 x 480 resolution, 25 fps, synchronized among themselves using a hardware clock
31 HD cameras, 1920 x 1080 resolution, 30 fps, synchronized among themselves using a hardware clock, timing aligned with VGA cameras
10 Kinect Ⅱ Sensors. 1920 x 1080 (RGB), 512 x 424 (depth), 30 fps, timing aligned among themselves and other sensors
5 DLP Projectors. synchronized with HD cameras