We present a novel approach for unsupervised learning of depth and ego-motion from monocular video.
In particular, annotation errors, the size of the dataset, and the level of challenge are addressed: new annotation for both datasets is created with an extra attention to the reliability of the ground truth.
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes.
#3 best model for Panoptic Segmentation on COCO panoptic
Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories.
#2 best model for Image Classification on iNaturalist
In our experiments, we search for the best convolutional layer (or "cell") on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, named "NASNet architecture".
#34 best model for Image Classification on ImageNet
The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.
#2 best model for Temporal Action Localization on J-HMDB-21
We present MorphNet, an approach to automate the design of neural network structures.
Our hypothesis is that the appearance of a person -- their pose, clothing, action -- is a powerful cue for localizing the objects they are interacting with.
#7 best model for Human-Object Interaction Detection on HICO-DET
In this work, we establish dense correspondences between RGB image and a surface-based representation of the human body, a task we refer to as dense human pose estimation.
#2 best model for Pose Estimation on DensePose-COCO