ECCV 2018

The most popular implementations from this conference
1
Card image cap
Progressive Neural Architecture Search
We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate model to guide the search through structure space.
2
Card image cap
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods.
3
Card image cap
Group Normalization
GN's computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. GN can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks.
4
Card image cap
Progressive Neural Architecture Search
We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate model to guide the search through structure space.
5
Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network
We propose a straightforward method that simultaneously reconstructs the 3D facial structure and provides dense alignment. To achieve this, we design a 2D representation called UV position map which records the 3D shape of a complete face in UV space, then train a simple Convolutional Neural Network to regress it from a single 2D image.
6
Card image cap
Image Inpainting for Irregular Holes Using Partial Convolutions
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness.
7
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
We introduce a fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints. ESPNet is based on a new convolutional module, efficient spatial pyramid (ESP), which is efficient in terms of computation, memory, and power.
8
Card image cap
Progressive Neural Architecture Search
We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate model to guide the search through structure space.
9
Card image cap
The Contextual Loss for Image Transformation with Non-Aligned Data
Feed-forward CNNs trained for image transformation problems rely on loss functions that measure the similarity between the generated image and a target image. Most of the common loss functions assume that these images are spatially aligned and compare pixels at corresponding locations.
10
Card image cap
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
Temporal action proposal generation is an important yet challenging problem, since temporal proposals with rich action content are indispensable for analysing real-world videos with long duration and high proportion irrelevant content. This problem requires methods not only generating proposals with precise temporal boundaries, but also retrieving proposals to cover truth action instances with high recall and high overlap using relatively fewer proposals.
11
Card image cap
ECO: Efficient Convolutional Network for Online Video Understanding
The state of the art in video understanding suffers from two problems: (1) The major part of reasoning is performed locally in the video, therefore, it misses important relationships within actions that span several seconds. In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time.
12
Card image cap
MVSNet: Depth Inference for Unstructured Multi-view Stereo
We present an end-to-end deep learning architecture for depth map inference from multi-view images. In the network, we first extract deep visual image features, and then build the 3D cost volume upon the reference camera frustum via the differentiable homography warping.
13
Image Inpainting for Irregular Holes Using Partial Convolutions
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness.
14
Card image cap
Image Inpainting for Irregular Holes Using Partial Convolutions
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness.
15
Card image cap
BodyNet: Volumetric Inference of 3D Human Body Shapes
Human shape estimation is an important task for video editing, animation and fashion industry. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image.
16
Card image cap
ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes
Recent studies on face attribute transfer have achieved great success. A lot of models are able to transfer face attributes with an input image.
17
Card image cap
Integral Human Pose Regression
State-of-the-art human pose estimation methods are based on heat map representation. In spite of the good performance, the representation has a few issues in nature, such as not differentiable and quantization error.
18
Card image cap
Simple Baselines for Human Pose Estimation and Tracking
There has been significant progress on pose estimation and increasing interests on pose tracking in recent years. At the same time, the overall algorithm and system complexity increases as well, making the algorithm analysis and comparison more difficult.
19
Card image cap
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Recent region-based object detectors are usually built with separate classification and localization branches on top of shared feature extraction networks. In this paper, we analyze failure cases of state-of-the-art detectors and observe that most hard false positives result from classification instead of localization.
20
Card image cap
Sparsely Aggregated Convolutional Networks
We explore a key architectural aspect of deep convolutional neural networks: the pattern of internal skip connections used to aggregate outputs of earlier layers for consumption by deeper layers. Such aggregation is critical to facilitate training of very deep networks in an end-to-end manner.
21
Card image cap
Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights
This work presents a method for adapting a single, fixed deep neural network to multiple tasks without affecting performance on already learned tasks. By building upon ideas from network quantization and pruning, we learn binary masks that piggyback on an existing network, or are applied to unmodified weights of that network to provide good performance on a new task.
22
Card image cap
Depth-aware CNN for RGB-D Segmentation
Convolutional neural networks (CNN) are limited by the lack of capability to handle geometric information due to the fixed grid kernel structure. The availability of depth data enables progress in RGB-D semantic segmentation with CNNs.
23
Card image cap
Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields
This task involves controlling the stroke size in the stylized results, which remains an open challenge. In this paper, we present a stroke controllable style transfer network that can achieve continuous and spatial stroke size control.
24
Card image cap
Learning SO(3) Equivariant Representations with Spherical CNNs
We address the problem of 3D rotation equivariance in convolutional neural networks. 3D rotations have been a challenging nuisance in 3D classification tasks requiring higher capacity and extended data augmentation in order to tackle it.
25
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
The thud of a bouncing ball, the onset of speech as lips open -- when visual and audio events occur together, it suggests that there might be a common, underlying event that produced both signals. In this paper, we argue that the visual and audio components of a video signal should be modeled jointly using a fused multisensory representation.
26
Card image cap
Pairwise Confusion for Fine-Grained Visual Classification
Fine-Grained Visual Classification (FGVC) datasets contain small sample sizes, along with significant intra-class variation and inter-class similarity. While prior work has addressed intra-class variation using localization and segmentation techniques, inter-class similarity may also affect feature learning and reduce classification performance.
27
Card image cap
Object Level Visual Reasoning in Videos
Human activity recognition is typically addressed by detecting key concepts like global and local motion, features related to object classes present in the scene, as well as features related to the global context. The next open challenges in activity recognition require a level of understanding that pushes beyond this and call for models with capabilities for fine distinction and detailed comprehension of interactions between actors and objects in a scene.
28
Card image cap
Fighting Fake News: Image Splice Detection via Learned Self-Consistency
Advances in photo editing and manipulation tools have made it significantly easier to create fake imagery. In this paper, we propose a learning algorithm for detecting visual image manipulations that is trained only using a large dataset of real photographs.
29
Card image cap
Image Inpainting for Irregular Holes Using Partial Convolutions
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness.
30
Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation
Estimation of 3D motion in a dynamic scene from a temporal pair of images is a core task in many scene understanding problems. With the learned network, we show how we can effectively estimate camera motion and projected scene flow using computed 2D optical flow and the inferred rigidity mask.
31
Card image cap
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets.
32
Card image cap
Progressive Neural Architecture Search
We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate model to guide the search through structure space.
33
Card image cap
DeepIM: Deep Iterative Matching for 6D Pose Estimation
Estimating the 6D pose of objects from images is an important problem in various applications such as robot manipulation and virtual reality. While direct regression of images to object poses has limited accuracy, matching rendered images of an object against the observed image can produce accurate results.
34
Card image cap
Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model
We propose a novel end-to-end semi-supervised adversarial framework to generate photorealistic face images of new identities with wide ranges of expressions, poses, and illuminations conditioned by a 3D morphable model. Previous adversarial style-transfer methods either supervise their networks with large volume of paired data or use unpaired data with a highly under-constrained two-way generative framework in an unsupervised fashion.
35
Card image cap
Simple Baselines for Human Pose Estimation and Tracking
There has been significant progress on pose estimation and increasing interests on pose tracking in recent years. At the same time, the overall algorithm and system complexity increases as well, making the algorithm analysis and comparison more difficult.
36
Card image cap
FloorNet: A Unified Framework for Floorplan Reconstruction from 3D Scans
The ultimate goal of this indoor mapping research is to automatically reconstruct a floorplan simply by walking through a house with a smartphone in a pocket. This paper tackles this problem by proposing FloorNet, a novel deep neural architecture.
37
Card image cap
Shift-Net: Image Inpainting via Deep Feature Rearrangement
To this end, the encoder feature of the known region is shifted to serve as an estimation of the missing parts. A guidance loss is introduced on decoder feature to minimize the distance between the decoder feature after fully connected layer and the ground-truth encoder feature of the missing parts.
38
Learning-based Video Motion Magnification
We show that the learned filters achieve high-quality results on real videos, with less ringing artifacts and better noise characteristics than previous methods. While our model is not trained with temporal filters, we found that the temporal filters can be used with our extracted representations up to a moderate magnification, enabling a frequency-based motion selection.
39
Card image cap
StarMap for Category-Agnostic Keypoint and Viewpoint Estimation
Existing methods define semantic keypoints separately for each category with a fixed number of semantic labels in fixed indices. We propose a category-agnostic keypoint representation, which combines a multi-peak heatmap (StarMap) for all the keypoints and their corresponding features as 3D locations in the canonical viewpoint (CanViewFeature) defined for each instance.
40
Dist-GAN: An Improved GAN using Distance Constraints
We use this constraint to explicitly prevent the generator from mode collapse. Second, we propose a discriminator-score distance constraint to align the distribution of the generated samples with that of the real samples through the discriminator score.
41
Dist-GAN: An Improved GAN using Distance Constraints
We use this constraint to explicitly prevent the generator from mode collapse. Second, we propose a discriminator-score distance constraint to align the distribution of the generated samples with that of the real samples through the discriminator score.
42
Card image cap
Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network
In recent years, deep learning methods have been successfully applied to single-image super-resolution tasks. Despite their great performances, deep learning methods cannot be easily applied to real-world applications due to the requirement of heavy computation.
43
Card image cap
Stacked Cross Attention for Image-Text Matching
Prior work either simply aggregates the similarity of all possible pairs of regions and words without attending differentially to more and less important words or regions, or uses a multi-step attentional process to capture limited number of semantic alignments which is less interpretable. Our approach achieves the state-of-the-art results on the MS-COCO and Flickr30K datasets.
44
Card image cap
Simple Baselines for Human Pose Estimation and Tracking
There has been significant progress on pose estimation and increasing interests on pose tracking in recent years. At the same time, the overall algorithm and system complexity increases as well, making the algorithm analysis and comparison more difficult.
45
Card image cap
Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation
Modern 3D human pose estimation techniques rely on deep networks, which require large amounts of training data. In this paper, we propose to overcome this problem by learning a geometry-aware body representation from multi-view images without annotations.
46
Card image cap
Learning Dynamic Memory Networks for Object Tracking
Template-matching methods for visual tracking have gained popularity recently due to their comparable performance and fast speed. In this paper, we propose a dynamic memory network to adapt the template to the target's appearance variations during tracking.
47
Card image cap
A Dataset and Architecture for Visual Reasoning with a Working Memory
COG is much simpler than the general problem of video analysis, yet it addresses many of the problems relating to visual and logical reasoning and memory -- problems that remain challenging for modern deep learning architectures. We additionally propose a deep learning architecture that performs competitively on other diagnostic VQA datasets (i.e. CLEVR) as well as easy settings of the COG dataset.
48
Card image cap
Group Normalization
GN's computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. GN can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks.
49
Card image cap
A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers
We first formulate the weight pruning problem of DNNs as a nonconvex optimization problem with combinatorial constraints specifying the sparsity requirements, and then adopt the ADMM framework for systematic weight pruning. By using ADMM, the original nonconvex optimization problem is decomposed into two subproblems that are solved iteratively.
50
Card image cap
Audio-Visual Event Localization in Unconstrained Videos
In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos. We define an audio-visual event as an event that is both visible and audible in a video segment.
51
Card image cap
Local Spectral Graph Convolution for Point Set Feature Learning
Feature learning on point clouds has shown great promise, with the introduction of effective and generalizable deep learning frameworks such as pointnet++. In the present article, we propose to overcome this limitation by using spectral graph convolution on a local graph, combined with a novel graph pooling strategy.
52
Card image cap
Progressive Neural Architecture Search
We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate model to guide the search through structure space.
53
Card image cap
Conditional Image-Text Embedding Networks
This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments.
54
Card image cap
PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model
We present a box-free bottom-up approach for the tasks of pose estimation and instance segmentation of people in multi-person images using an efficient single-shot model. Further, we propose a part-induced geometric embedding descriptor which allows us to associate semantic person pixels with their corresponding person instance, delivering instance-level person segmentations.
55
Card image cap
NAM: Non-Adversarial Unsupervised Domain Mapping
Several methods were recently proposed for the task of translating images between domains without prior knowledge in the form of correspondences. NAM relies on a pre-trained generative model of the target domain, and aligns each source image with an image synthesized from the target domain, while jointly optimizing the domain mapping function.
56
Card image cap
Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders
Generative models that learn disentangled representations for different factors of variation in an image can be very useful for targeted data augmentation. Our non-adversarial approach is in contrast with the recent works that combine adversarial training with auto-encoders to disentangle representations.
57
Card image cap
Group Normalization
GN's computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. GN can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks.
58
Card image cap
Open Set Domain Adaptation by Backpropagation
Almost all of them are proposed for a closed-set scenario, where the source and the target domain completely share the class of their samples. A target domain can contain samples of classes that are not shared by the source domain.
59
Card image cap
Learning Type-Aware Embeddings for Fashion Compatibility
Outfits in online fashion data are composed of items of many different types (e.g. top, bottom, shoes) that share some stylistic relationship with one another. A representation for building outfits requires a method that can learn both notions of similarity (for example, when two tops are interchangeable) and compatibility (items of possibly different type that can go together in an outfit).
60
Card image cap
Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition
We present a new computational model for gaze prediction in egocentric videos by exploring patterns in temporal shift of gaze fixations (attention transition) that are dependent on egocentric manipulation tasks. Our assumption is that the high-level context of how a task is completed in a certain way has a strong influence on attention transition and should be modeled for gaze prediction in natural dynamic scenes.
61
Card image cap
Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition
We present a new computational model for gaze prediction in egocentric videos by exploring patterns in temporal shift of gaze fixations (attention transition) that are dependent on egocentric manipulation tasks. Our assumption is that the high-level context of how a task is completed in a certain way has a strong influence on attention transition and should be modeled for gaze prediction in natural dynamic scenes.
62
Card image cap
Group Normalization
GN's computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. GN can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks.
63
Card image cap
Folded Recurrent Neural Networks for Future Video Prediction
Future video prediction is an ill-posed Computer Vision problem that recently received much attention. Its main challenges are the high variability in video content, the propagation of errors through time, and the non-specificity of the future frames: given a sequence of past frames there is a continuous distribution of possible futures.
64
Card image cap
Attributes as Operators: Factorizing Unseen Attribute-Object Compositions
We present a new approach to modeling visual attributes. In addition, we show that not only can our model recognize unseen compositions robustly in an open-world setting, it can also generalize to compositions where objects themselves were unseen during training.
65
Card image cap
SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters
Deep neural networks have enjoyed remarkable success for various vision tasks, however it remains challenging to apply CNNs to domains lacking a regular underlying structures such as 3D point clouds. Towards this we propose a novel convolutional architecture, termed SpiderCNN, to efficiently extract geometric features from point clouds.
66
Card image cap
ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking
Physical intuition is pivotal for intelligent agents to perform complex tasks. In this paper we investigate the passive acquisition of an intuitive understanding of physical principles as well as the active utilisation of this intuition in the context of generalised object stacking.
67
Transferring GANs: generating images from limited data
Transferring the knowledge of pretrained networks to new domains by means of finetuning is a widely used practice for applications based on discriminative models. To the best of our knowledge this practice has not been studied within the context of generative deep networks.
68
Card image cap
Lifting Layers: Analysis and Applications
The great advances of learning-based approaches in image processing and computer vision are largely based on deeply nested networks that compose linear transfer functions with suitable non-linearities. A lifting layer increases the dimensionality of the input, naturally yields a linear spline when combined with a fully connected layer, and therefore closes the gap between low and high dimensional approximation problems.
69
Card image cap
Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition
Attention-based learning for fine-grained image recognition remains a challenging task, where most of the existing methods treat each object part in isolation, while neglecting the correlations among them. In addition, the multi-stage or multi-scale mechanisms involved make the existing methods less efficient and hard to be trained end-to-end.
70
Card image cap
Image Inpainting for Irregular Holes Using Partial Convolutions
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness.
71
Learning Deep Representations with Probabilistic Knowledge Transfer
Knowledge Transfer (KT) techniques tackle the problem of transferring the knowledge from a large and complex neural network into a smaller and faster one. However, existing KT methods are tailored towards classification tasks and they cannot be used efficiently for other representation learning tasks.
72
Card image cap
Group Normalization
GN's computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. GN can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks.
73
Card image cap
Deep Directional Statistics: Pose Estimation with Uncertainty Quantification
However, in challenging imaging conditions such as on low-resolution images or when the image is corrupted by imaging artifacts, current systems degrade considerably in accuracy. Whereas a single von Mises distribution is making strong assumptions about the shape of the distribution, we extend the basic model to predict a mixture of von Mises distributions.
74
Card image cap
Estimating the Success of Unsupervised Image to Image Translation
While in supervised learning, the validation error is an unbiased estimator of the generalization (test) error and complexity-based generalization bounds are abundant, no such bounds exist for learning a mapping in an unsupervised way. As a result, when training GANs and specifically when using GANs for learning to map between domains in a completely unsupervised way, one is forced to select the hyperparameters and the stopping epoch by subjectively examining multiple options.
75
Image Inpainting for Irregular Holes Using Partial Convolutions
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness.
76
Lip Movements Generation at a Glance
Cross-modality generation is an emerging topic that aims to synthesize data in one modality based on information in a different modality. In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech.
77
Card image cap
Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders
Generative models that learn disentangled representations for different factors of variation in an image can be very useful for targeted data augmentation. Our non-adversarial approach is in contrast with the recent works that combine adversarial training with auto-encoders to disentangle representations.
78
Card image cap
Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders
Generative models that learn disentangled representations for different factors of variation in an image can be very useful for targeted data augmentation. Our non-adversarial approach is in contrast with the recent works that combine adversarial training with auto-encoders to disentangle representations.
79
Card image cap
A Framework for Evaluating 6-DOF Object Trackers
We present a challenging and realistic novel dataset for evaluating 6-DOF object tracking algorithms. Existing datasets show serious limitations---notably, unrealistic synthetic data, or real data with large fiducial markers---preventing the community from obtaining an accurate picture of the state-of-the-art.
80
Card image cap
Women also Snowboard: Overcoming Bias in Captioning Models
Specifically, image captioning models tend to exaggerate biases present in training data (e.g., if a word is present in 60% of training sentences, it might be predicted in 70% of sentences at test time). We introduce a new Equalizer model that ensures equal gender probability when gender evidence is occluded in a scene and confident predictions when gender evidence is present.
81
Card image cap
A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers
We first formulate the weight pruning problem of DNNs as a nonconvex optimization problem with combinatorial constraints specifying the sparsity requirements, and then adopt the ADMM framework for systematic weight pruning. By using ADMM, the original nonconvex optimization problem is decomposed into two subproblems that are solved iteratively.
82
Card image cap
Image Inpainting for Irregular Holes Using Partial Convolutions
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness.
83
Card image cap
Image Inpainting for Irregular Holes Using Partial Convolutions
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness.
84
Card image cap
Estimating Depth from RGB and Sparse Sensing
We present a deep model that can accurately produce dense depth maps given an RGB image with known depth at a very sparse set of pixels. The model works simultaneously for both indoor/outdoor scenes and produces state-of-the-art dense depth maps at nearly real-time speeds on both the NYUv2 and KITTI datasets.
85
Card image cap
Group Normalization
GN's computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. GN can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks.
86
Card image cap
Group Normalization
GN's computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. GN can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks.
87
Card image cap
Group Normalization
GN's computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. GN can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks.
88
Card image cap
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
Temporal action proposal generation is an important yet challenging problem, since temporal proposals with rich action content are indispensable for analysing real-world videos with long duration and high proportion irrelevant content. This problem requires methods not only generating proposals with precise temporal boundaries, but also retrieving proposals to cover truth action instances with high recall and high overlap using relatively fewer proposals.
89
Card image cap
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
Temporal action proposal generation is an important yet challenging problem, since temporal proposals with rich action content are indispensable for analysing real-world videos with long duration and high proportion irrelevant content. This problem requires methods not only generating proposals with precise temporal boundaries, but also retrieving proposals to cover truth action instances with high recall and high overlap using relatively fewer proposals.
90
Card image cap
Image Inpainting for Irregular Holes Using Partial Convolutions
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness.
91
Card image cap
Image Inpainting for Irregular Holes Using Partial Convolutions
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness.
92
Card image cap
Image Inpainting for Irregular Holes Using Partial Convolutions
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness.
93
Card image cap
FloorNet: A Unified Framework for Floorplan Reconstruction from 3D Scans
The ultimate goal of this indoor mapping research is to automatically reconstruct a floorplan simply by walking through a house with a smartphone in a pocket. This paper tackles this problem by proposing FloorNet, a novel deep neural architecture.
94
Card image cap
Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization
Deep neural networks have lately shown tremendous performance in various applications including vision and speech processing tasks. We demonstrate empirically that it leads to enhanced robustness results with a minimal change in the original network's accuracy.
95
Card image cap
Fictitious GAN: Training GANs with Historical Models
Inspired by the fictitious play learning process, a novel training method, referred to as Fictitious GAN, is introduced. generator) is updated according to the best-response to the mixture outputs from a sequence of previously trained generators (resp.
96
Card image cap
Learning SO(3) Equivariant Representations with Spherical CNNs
We address the problem of 3D rotation equivariance in convolutional neural networks. 3D rotations have been a challenging nuisance in 3D classification tasks requiring higher capacity and extended data augmentation in order to tackle it.
97
Card image cap
Stacked Cross Attention for Image-Text Matching
Prior work either simply aggregates the similarity of all possible pairs of regions and words without attending differentially to more and less important words or regions, or uses a multi-step attentional process to capture limited number of semantic alignments which is less interpretable. Our approach achieves the state-of-the-art results on the MS-COCO and Flickr30K datasets.
98
Card image cap
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Recent region-based object detectors are usually built with separate classification and localization branches on top of shared feature extraction networks. In this paper, we analyze failure cases of state-of-the-art detectors and observe that most hard false positives result from classification instead of localization.
99
Card image cap
Integral Human Pose Regression
State-of-the-art human pose estimation methods are based on heat map representation. In spite of the good performance, the representation has a few issues in nature, such as not differentiable and quantization error.
100
Card image cap
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods.