VISOR is a dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, and it contains 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, and 67K hand-object relations, covering 36 hours of 179 untrimmed videos.
Source: EPIC-KITCHENS VISOR Benchmark Video Segmentations and Object RelationsPaper | Code | Results | Date | Stars |
---|