Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance.
We present a novel method for simultaneous learning of depth, egomotion, object motion, and camera intrinsics from monocular videos, using only consistency across neighboring video frames as supervision signal.
We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization.
SOTA for Object Detection on COCO minival
We present a novel image editing system that generates images as the user provides free-form mask, sketch and color as an input.
As an instance-level recognition problem, person re-identification (ReID) relies on discriminative features, which not only capture different spatial scales but also encapsulate an arbitrary combination of multiple scales.
#2 best model for Person Re-Identification on CUHK03
By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.
#12 best model for Object Detection on COCO test-dev
We introduce SinGAN, an unconditional generative model that can be learned from a single natural image.