Search Results for author: Roberto Cipolla

Found 64 papers, 32 papers with code

ReCoRe: Regularized Contrastive Representation Learning of World Model

no code implementations14 Dec 2023 Rudra P. K. Poudel, Harit Pandya, Stephan Liwicki, Roberto Cipolla

To address these challenges, we present a world model that learns invariant features using (i) contrastive unsupervised learning and (ii) an intervention-invariant regularizer.

Contrastive Learning Depth Estimation +5

LanGWM: Language Grounded World Model

no code implementations29 Nov 2023 Rudra P. K. Poudel, Harit Pandya, Chao Zhang, Roberto Cipolla

Furthermore, our proposed technique of explicit language-grounded visual representation learning has the potential to improve models for human-robot interaction because our extracted visual features are language grounded.

Model-based Reinforcement Learning Out-of-Distribution Generalization +2

A Neural Height-Map Approach for the Binocular Photometric Stereo Problem

no code implementations10 Nov 2023 Fotios Logothetis, Ignas Budvytis, Roberto Cipolla

As in recent neural multi-view shape estimation frameworks such as NeRF, SIREN and inverse graphics approaches to multi-view photometric stereo (e. g. PS-NeRF) we formulate shape estimation task as learning of a differentiable surface and texture representation by minimising surface normal discrepancy for normals estimated from multiple varying light images for two views as well as discrepancy between rendered surface intensity and observed images.

FOUND: Foot Optimization with Uncertain Normals for Surface Deformation Using Synthetic Data

1 code implementation27 Oct 2023 Oliver Boyne, Gwangbin Bae, James Charles, Roberto Cipolla

Our FOUND approach tackles this, with 4 main contributions: (i) SynFoot, a synthetic dataset of 50, 000 photorealistic foot images, paired with ground truth surface normals and keypoints; (ii) an uncertainty-aware surface normal predictor trained on our synthetic dataset; (iii) an optimization scheme for fitting a generative foot model to a series of images; and (iv) a benchmark dataset of calibrated images and high resolution ground truth geometry.

Surface Normal Estimation Surface Reconstruction

Sparse Multi-Object Render-and-Compare

no code implementations17 Oct 2023 Florian Langer, Ignas Budvytis, Roberto Cipolla

Introducing a new network architecture Multi-SPARC we learn to perform CAD model alignments for multiple detected objects jointly.

Object

IMP: Iterative Matching and Pose Estimation with Adaptive Pooling

1 code implementation CVPR 2023 Fei Xue, Ignas Budvytis, Roberto Cipolla

Previous methods solve feature matching and pose estimation using a two-stage process by first finding matches and then estimating the pose.

Pose Estimation

SFD2: Semantic-guided Feature Detection and Description

1 code implementation CVPR 2023 Fei Xue, Ignas Budvytis, Roberto Cipolla

Visual localization is a fundamental task for various applications including autonomous driving and robotics.

Autonomous Driving Visual Localization

FIND: An Unsupervised Implicit 3D Model of Articulated Human Feet

1 code implementation21 Oct 2022 Oliver Boyne, James Charles, Roberto Cipolla

In this paper we present a high fidelity and articulated 3D human foot model.

Disentanglement

Model-Based Imitation Learning for Urban Driving

1 code implementation14 Oct 2022 Anthony Hu, Gianluca Corrado, Nicolas Griffiths, Zak Murez, Corina Gurau, Hudson Yeo, Alex Kendall, Roberto Cipolla, Jamie Shotton

Our approach is the first camera-only method that models static scene, dynamic scene, and ego-behaviour in an urban driving environment.

Autonomous Driving Bird's-Eye View Semantic Segmentation +3

A CNN Based Approach for the Point-Light Photometric Stereo Problem

no code implementations10 Oct 2022 Fotios Logothetis, Roberto Mecca, Ignas Budvytis, Roberto Cipolla

Reconstructing the 3D shape of an object using several images under different light sources is a very challenging task, especially when realistic assumptions such as light propagation and attenuation, perspective viewing geometry and specular light reflection are considered.

SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB image

1 code implementation3 Oct 2022 Florian Langer, Gwangbin Bae, Ignas Budvytis, Roberto Cipolla

This combined information is the input to a pose prediction network, SPARC-Net which we train to predict a 9 DoF CAD model pose update.

Pose Prediction Retrieval

Contrastive Unsupervised Learning of World Model with Invariant Causal Features

no code implementations29 Sep 2022 Rudra P. K. Poudel, Harit Pandya, Roberto Cipolla

In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation.

Data Augmentation Depth Estimation +5

Efficient Large-Scale Localization by Global Instance Recognition

no code implementations CVPR 2022 Fei Xue, Ignas Budvytis, Daniel Olmeda Reino, Roberto Cipolla

Hierarchical frameworks consisting of both coarse and fine localization are often used as the standard pipeline for large-scale visual localization.

Visual Localization

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

1 code implementation CVPR 2022 Gwangbin Bae, Ignas Budvytis, Roberto Cipolla

To this end, we propose MaGNet, a novel framework for fusing single-view depth probability with multi-view geometry, to improve the accuracy, robustness and efficiency of multi-view depth estimation.

Depth Estimation

Discrete neural representations for explainable anomaly detection

no code implementations10 Dec 2021 Stanislaw Szymanowicz, James Charles, Roberto Cipolla

The aim of this work is to detect and automatically generate high-level explanations of anomalous events in video.

Anomaly Detection Object +1

Leveraging Geometry for Shape Estimation from a Single RGB Image

1 code implementation10 Nov 2021 Florian Langer, Ignas Budvytis, Roberto Cipolla

In this work we demonstrate how cross-domain keypoint matches from an RGB image to a rendered CAD model allow for more precise object pose predictions compared to ones obtained through direct predictions.

Object Retrieval

Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation

1 code implementation ICCV 2021 Gwangbin Bae, Ignas Budvytis, Roberto Cipolla

Experimental results show that the proposed method outperforms the state-of-the-art in ScanNet and NYUv2, and that the estimated uncertainty correlates well with the prediction error.

Scene Understanding Surface Normal Estimation +1

X-MAN: Explaining multiple sources of anomalies in video

no code implementations16 Jun 2021 Stanislaw Szymanowicz, James Charles, Roberto Cipolla

In an effort to tackle this problem we make the following contributions: (1) we show how to build interpretable feature representations suitable for detecting anomalies with state of the art performance, (2) we propose an interpretable probabilistic anomaly detector which can describe the reason behind it's response using high level concepts, (3) we are the first to directly consider object interactions for anomaly detection and (4) we propose a new task of explaining anomalies and release a large dataset for evaluating methods on this task.

Anomaly Detection Decision Making

LUCES: A Dataset for Near-Field Point Light Source Photometric Stereo

no code implementations27 Apr 2021 Roberto Mecca, Fotios Logothetis, Ignas Budvytis, Roberto Cipolla

In order to fill the gap in evaluating near-field photometric stereo methods, we introduce LUCES the first real-world 'dataset for near-fieLd point light soUrCe photomEtric Stereo' of 14 objects of a varying of materials.

Probabilistic 3D Human Shape and Pose Estimation from Multiple Unconstrained Images in the Wild

no code implementations CVPR 2021 Akash Sengupta, Ignas Budvytis, Roberto Cipolla

In contrast, we propose a new task: shape and pose estimation from a group of multiple images of a human subject, without constraints on subject pose, camera viewpoint or background conditions between images in the group.

3D Human Shape Estimation Pose Prediction

Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild

1 code implementation21 Sep 2020 Akash Sengupta, Ignas Budvytis, Roberto Cipolla

Thus, we propose STRAPS (Synthetic Training for Real Accurate Pose and Shape), a system that utilises proxy representations, such as silhouettes and 2D joints, as inputs to a shape and pose regression neural network, which is trained with synthetic training data (generated on-the-fly during training using the SMPL statistical body model) to overcome data scarcity.

3D human pose and shape estimation 3D Human Shape Estimation +3

A CNN Based Approach for the Near-Field Photometric Stereo Problem

no code implementations12 Sep 2020 Fotios Logothetis, Ignas Budvytis, Roberto Mecca, Roberto Cipolla

Secondly, we compute the depth by integrating the normal field in order to iteratively estimate light directions and attenuation which is used to compensate the input images to compute reflectance samples for the next iteration.

PX-NET: Simple and Efficient Pixel-Wise Training of Photometric Stereo Networks

no code implementations ICCV 2021 Fotios Logothetis, Ignas Budvytis, Roberto Mecca, Roberto Cipolla

We show that global physical effects can be approximated on the observation map domain and this simplifies and speeds up the data creation procedure.

Data Augmentation

Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop

2 code implementations ECCV 2020 Benjamin Biggs, Oliver Boyne, James Charles, Andrew Fitzgibbon, Roberto Cipolla

We introduce an automatic, end-to-end method for recovering the 3D pose and shape of dogs from monocular internet images.

Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks

1 code implementation CVPR 2020 Thomas Roddick, Roberto Cipolla

Autonomous vehicles commonly rely on highly detailed birds-eye-view maps of their environment, which capture both static elements of the scene such as road layout as well as dynamic elements such as other cars and pedestrians.

3D Object Detection Autonomous Vehicles +2

Large Scale Joint Semantic Re-Localisation and Scene Understanding via Globally Unique Instance Coordinate Regression

no code implementations23 Sep 2019 Ignas Budvytis, Marvin Teichmann, Tomas Vojir, Roberto Cipolla

We obtain smaller mean distance and angular errors than state-of-the-art 6-DoF pose estimation algorithms based on direct pose regression and pose estimation from scene coordinates on all datasets.

Autonomous Driving Pose Estimation +2

Orientation-aware Semantic Segmentation on Icosahedron Spheres

1 code implementation ICCV 2019 Chao Zhang, Stephan Liwicki, William Smith, Roberto Cipolla

For the spherical domain, several methods recently adopt an icosahedron mesh, but systems are typically rotation invariant or require significant memory and parameters, thus enabling execution only at very low resolutions.

Autonomous Driving Semantic Segmentation

Orthographic Feature Transform for Monocular 3D Object Detection

1 code implementation20 Nov 2018 Thomas Roddick, Alex Kendall, Roberto Cipolla

This allows us to reason holistically about the spatial configuration of the scene in a domain where scale is consistent and distances between objects are meaningful.

3D Object Detection From Monocular Images Monocular 3D Object Detection +2

Convolutional CRFs for Semantic Segmentation

1 code implementation ICLR 2019 Marvin T. T. Teichmann, Roberto Cipolla

For the challenging semantic image segmentation task the most efficient models have traditionally combined the structured modelling capabilities of Conditional Random Fields (CRFs) with the feature extraction power of CNNs.

Image Segmentation Segmentation +1

Semi-Calibrated Near Field Photometric Stereo

no code implementations CVPR 2017 Fotios Logothetis, Roberto Mecca, Roberto Cipolla

3D reconstruction from shading information through Photometric Stereo is considered a very challenging problem in Computer Vision.

3D Reconstruction

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

16 code implementations22 Dec 2016 Marvin Teichmann, Michael Weber, Marius Zoellner, Roberto Cipolla, Raquel Urtasun

While most approaches to semantic reasoning have focused on improving performance, in this paper we argue that computational times are very important in order to enable real time applications such as autonomous driving.

Autonomous Driving General Classification +2

Understanding Real World Indoor Scenes With Synthetic Data

no code implementations CVPR 2016 Ankur Handa, Viorica Patraucean, Vijay Badrinarayanan, Simon Stent, Roberto Cipolla

Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments.

Scene Understanding

Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups

no code implementations CVPR 2017 Yani Ioannou, Duncan Robertson, Roberto Cipolla, Antonio Criminisi

We propose a new method for creating computationally efficient and compact convolutional neural networks (CNNs) using a novel sparse connection structure that resembles a tree root.

Refining Architectures of Deep Convolutional Neural Networks

no code implementations CVPR 2016 Sukrit Shankar, Duncan Robertson, Yani Ioannou, Antonio Criminisi, Roberto Cipolla

Deep Convolutional Neural Networks (CNNs) have recently evinced immense success for various image recognition tasks.

SceneNet: Understanding Real World Indoor Scenes With Synthetic Data

1 code implementation22 Nov 2015 Ankur Handa, Viorica Patraucean, Vijay Badrinarayanan, Simon Stent, Roberto Cipolla

Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments.

Scene Understanding

Training CNNs with Low-Rank Filters for Efficient Image Classification

no code implementations20 Nov 2015 Yani Ioannou, Duncan Robertson, Jamie Shotton, Roberto Cipolla, Antonio Criminisi

Applying our method to a near state-of-the-art network for CIFAR, we achieved comparable accuracy with 46% less compute and 55% fewer parameters.

Classification General Classification +1

Spatio-temporal video autoencoder with differentiable memory

1 code implementation19 Nov 2015 Viorica Patraucean, Ankur Handa, Roberto Cipolla

At each time step, the system receives as input a video frame, predicts the optical flow based on the current observation and the LSTM memory state as a dense transformation map, and applies it to the current frame to generate the next frame.

Motion Estimation Optical Flow Estimation +2

TemplateNet for Depth-Based Object Instance Recognition

no code implementations10 Nov 2015 Ujwal Bonde, Vijay Badrinarayanan, Roberto Cipolla, Minh-Tri Pham

We present a novel deep architecture termed templateNet for depth based object instance recognition.

Object

Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

22 code implementations9 Nov 2015 Alex Kendall, Vijay Badrinarayanan, Roberto Cipolla

Semantic segmentation is an important tool for visual scene understanding and a meaningful measure of uncertainty is essential for decision making.

Decision Making Scene Understanding +2

Symmetry-invariant optimization in deep networks

no code implementations5 Nov 2015 Vijay Badrinarayanan, Bamdev Mishra, Roberto Cipolla

Recent works have highlighted scale invariance or symmetry that is present in the weight space of a typical deep network and the adverse effect that it has on the Euclidean gradient based stochastic gradient descent optimization.

Computational Efficiency Image Segmentation +1

Understanding symmetries in deep networks

no code implementations3 Nov 2015 Vijay Badrinarayanan, Bamdev Mishra, Roberto Cipolla

Consequently, training the network boils down to using stochastic gradient descent updates on the unit-norm manifold.

Computational Efficiency

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

75 code implementations2 Nov 2015 Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla

We show that SegNet provides good performance with competitive inference time and more efficient inference memory-wise as compared to other architectures.

Crowd Counting General Classification +6

Modelling Uncertainty in Deep Learning for Camera Relocalization

1 code implementation19 Sep 2015 Alex Kendall, Roberto Cipolla

Using a Bayesian convolutional neural network implementation we obtain an estimate of the model's relocalization uncertainty and improve state of the art localization accuracy on a large scale outdoor dataset.

Camera Relocalization

DEEP-CARVING: Discovering Visual Attributes by Carving Deep Neural Nets

no code implementations CVPR 2015 Sukrit Shankar, Vikas K. Garg, Roberto Cipolla

To ameliorate this limitation, we propose Deep-Carving, a novel training procedure with CNNs, that helps the net efficiently carve itself for the task of multiple attribute prediction.

Attribute Image Retrieval

Bi-label Propagation for Generic Multiple Object Tracking

no code implementations CVPR 2014 Wenhan Luo, Tae-Kyun Kim, Bjorn Stenger, Xiaowei Zhao, Roberto Cipolla

In this paper, we propose a label propagation framework to handle the multiple object tracking (MOT) problem for a generic object type (cf.

Multiple Object Tracking Object

Expressive Visual Text-to-Speech Using Active Appearance Models

no code implementations CVPR 2013 Robert Anderson, Bjorn Stenger, Vincent Wan, Roberto Cipolla

This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights.

Cannot find the paper you are looking for? You can Submit a new open access paper.