Search Results for author: Fisher Yu

Found 96 papers, 66 papers with code

UniDepth: Universal Monocular Metric Depth Estimation

1 code implementation • 27 Mar 2024 • Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc van Gool, Fisher Yu

However, the remarkable accuracy of recent MMDE methods is confined to their training domains.

Ranked #1 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Monocular Depth Estimation

283

Paper
Code

DexDribbler: Learning Dexterous Soccer Manipulation via Dynamic Supervision

1 code implementation • 21 Mar 2024 • Yutong Hu, Kehan Wen, Fisher Yu

Learning dexterous locomotion policy for legged robots is becoming increasingly popular due to its ability to handle diverse terrains and resemble intelligent behaviors.

Paper
Code

S$^3$M-Net: Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous Driving

no code implementations • 21 Jan 2024 • Zhiyuan Wu, Yi Feng, Chuang-Wei Liu, Fisher Yu, Qijun Chen, Rui Fan

Hence, in this article, we introduce S$^3$M-Net, a novel joint learning framework developed to perform semantic segmentation and stereo matching simultaneously.

Autonomous Driving Scene Understanding +2

Paper
Add Code

ICGNet: A Unified Approach for Instance-Centric Grasping

no code implementations • 18 Jan 2024 • René Zurbrügg, Yifan Liu, Francis Engelmann, Suryansh Kumar, Marco Hutter, Vaishakh Patil, Fisher Yu

Executing a successful grasp in a cluttered environment requires multiple levels of scene understanding: First, the robot needs to analyze the geometric properties of individual objects to find feasible grasps.

Object Object Reconstruction +1

Paper
Add Code

MuRF: Multi-Baseline Radiance Fields

1 code implementation • 7 Dec 2023 • Haofei Xu, Anpei Chen, Yuedong Chen, Christos Sakaridis, Yulun Zhang, Marc Pollefeys, Andreas Geiger, Fisher Yu

We present Multi-Baseline Radiance Fields (MuRF), a general feed-forward approach to solving sparse view synthesis under multiple different baseline settings (small and large baselines, and different number of input views).

Zero-shot Generalization

Paper
Code

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

1 code implementation • 1 Dec 2023 • Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke

To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes.

Colorization Novel View Synthesis +2

414

Paper
Code

Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding

1 code implementation • NeurIPS 2023 • Zhejun Zhang, Alexander Liniger, Christos Sakaridis, Fisher Yu, Luc van Gool

The real-world deployment of an autonomous driving system requires its components to run on-board and in real-time, including the motion prediction module that predicts the future trajectories of surrounding traffic participants.

Autonomous Driving motion prediction

Paper
Code

COOLer: Class-Incremental Learning for Appearance-Based Multiple Object Tracking

1 code implementation • 4 Oct 2023 • Zhizheng Liu, Mattia Segu, Fisher Yu

Continual learning allows a model to learn multiple tasks sequentially while retaining the old knowledge without the training data of the preceding tasks.

Class Incremental Learning Disentanglement +2

Paper
Code

DARTH: Holistic Test-time Adaptation for Multiple Object Tracking

1 code implementation • ICCV 2023 • Mattia Segu, Bernt Schiele, Fisher Yu

However, the nature of a MOT system is manifold - requiring object detection and instance association - and adapting all its components is non-trivial.

Autonomous Driving Multiple Object Tracking +4

Paper
Code

Distilling ODE Solvers of Diffusion Models into Smaller Steps

no code implementations • 28 Sep 2023 • Sanghwan Kim, Hao Tang, Fisher Yu

Notably, our method incurs negligible computational overhead compared to previous distillation techniques, facilitating straightforward and rapid integration with existing samplers.

Denoising Knowledge Distillation

Paper
Add Code

Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving

no code implementations • ICCV 2023 • Thomas E. Huang, Yifan Liu, Luc van Gool, Fisher Yu

VTD is a promising new direction for exploring the unification of perception tasks in autonomous driving.

Autonomous Driving Representation Learning +1

Paper
Add Code

Three Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding

no code implementations • 8 Sep 2023 • Ozan Unal, Christos Sakaridis, Suman Saha, Fisher Yu, Luc van Gool

A common formulation to tackle 3D visual grounding is grounding-by-detection, where localization is done via bounding boxes.

3D Instance Segmentation Object +3

Paper
Add Code

R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras

no code implementations • ICCV 2023 • Aron Schmied, Tobias Fischer, Martin Danelljan, Marc Pollefeys, Fisher Yu

We propose R3D3, a multi-camera system for dense 3D reconstruction and ego-motion estimation.

3D Reconstruction Autonomous Driving +4

Paper
Add Code

MolGrapher: Graph-based Visual Recognition of Chemical Structures

1 code implementation • ICCV 2023 • Lucas Morin, Martin Danelljan, Maria Isabel Agea, Ahmed Nassar, Valery Weber, Ingmar Meijer, Peter Staar, Fisher Yu

In addition, we introduce a large-scale benchmark of annotated real molecule images, USPTO-30K, to spur research on this critical topic.

Paper
Code

Video OWL-ViT: Temporally-consistent open-world localization in video

no code implementations • ICCV 2023 • Georg Heigold, Matthias Minderer, Alexey Gritsenko, Alex Bewley, Daniel Keysers, Mario Lučić, Fisher Yu, Thomas Kipf

Our model is end-to-end trainable on video data and enjoys improved temporal consistency compared to tracking-by-detection baselines, while retaining the open-world capabilities of the backbone detector.

Object Object Localization

Paper
Add Code

Dual Aggregation Transformer for Image Super-Resolution

1 code implementation • ICCV 2023 • Zheng Chen, Yulun Zhang, Jinjin Gu, Linghe Kong, Xiaokang Yang, Fisher Yu

Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner.

Ranked #6 on Image Super-Resolution on Manga109 - 4x upscaling

Image Super-Resolution

310

Paper
Code

Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects

1 code implementation • 6 Aug 2023 • Chunming He, Kai Li, Yachao Zhang, Yulun Zhang, Zhenhua Guo, Xiu Li, Martin Danelljan, Fisher Yu

On the prey side, we propose an adversarial training framework, Camouflageator, which introduces an auxiliary generator to generate more camouflaged objects that are harder for a COD method to detect.

object-detection Object Detection

Paper
Code

Cascade-DETR: Delving into High-Quality Universal Object Detection

1 code implementation • ICCV 2023 • Mingqiao Ye, Lei Ke, Siyuan Li, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu

While dominating on the COCO benchmark, recent Transformer-based detection methods are not competitive in diverse domains.

Object object-detection +2

Paper
Code

Segment Anything Meets Point Tracking

1 code implementation • 3 Jul 2023 • Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu

The Segment Anything Model (SAM) has established itself as a powerful zero-shot image segmentation model, enabled by efficient point-centric annotation and prompt-based models.

Interactive Video Object Segmentation Object +5

898

Paper
Code

SSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street Views

1 code implementation • 15 Jun 2023 • Yiming Li, Sihang Li, Xinhao Liu, Moonjun Gong, Kenan Li, Nuo Chen, Zijun Wang, Zhiheng Li, Tao Jiang, Fisher Yu, Yue Wang, Hang Zhao, Zhiding Yu, Chen Feng

Monocular scene understanding is a foundational component of autonomous systems.

3D Semantic Scene Completion 3D Semantic Scene Completion from a single 2D image

146

Paper
Code

Segment Anything in High Quality

2 code implementations • NeurIPS 2023 • Lei Ke, Mingqiao Ye, Martin Danelljan, Yifan Liu, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

HQ-SAM is only trained on the introduced detaset of 44k masks, which takes only 4 hours on 8 GPUs.

Ranked #1 on Zero-Shot Instance Segmentation on LVIS v1.0 val

Zero-Shot Instance Segmentation Zero Shot Segmentation

13,386

Paper
Code

Condition-Invariant Semantic Segmentation

1 code implementation • 27 May 2023 • Christos Sakaridis, David Bruggemann, Fisher Yu, Luc van Gool

Motivated by these findings, we propose to leverage stylization in performing feature-level adaptation by aligning the internal network features extracted by the encoder of the network from the original and the stylized view of each input image with a novel feature invariance loss.

Segmentation Semantic Segmentation +1

Paper
Code

Maskomaly:Zero-Shot Mask Anomaly Segmentation

no code implementations • 26 May 2023 • Jan Ackermann, Christos Sakaridis, Fisher Yu

We present a simple and practical framework for anomaly segmentation called Maskomaly.

Informativeness Segmentation +1

Paper
Add Code

How To Not Train Your Dragon: Training-free Embodied Object Goal Navigation with Semantic Frontiers

no code implementations • 26 May 2023 • Junting Chen, Guohao Li, Suryansh Kumar, Bernard Ghanem, Fisher Yu

Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.

Imitation Learning Navigate +2

Paper
Add Code

OVTrack: Open-Vocabulary Multiple Object Tracking

1 code implementation • CVPR 2023 • Siyuan Li, Tobias Fischer, Lei Ke, Henghui Ding, Martin Danelljan, Fisher Yu

This leaves contemporary MOT methods limited to a small set of pre-defined object categories.

Denoising Hallucination +4

Paper
Code

iDisc: Internal Discretization for Monocular Depth Estimation

2 code implementations • CVPR 2023 • Luigi Piccinelli, Christos Sakaridis, Fisher Yu

Our method sets the new state of the art with significant improvements on NYU-Depth v2 and KITTI, outperforming all published methods on the official KITTI benchmark.

Ranked #2 on Surface Normals Estimation on NYU Depth v2

Autonomous Driving Monocular Depth Estimation +3

283

Paper
Code

Mask-Free Video Instance Segmentation

1 code implementation • CVPR 2023 • Lei Ke, Martin Danelljan, Henghui Ding, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

A consistency loss is then enforced on the found matches.

Ranked #1 on Video Instance Segmentation on Youtube-VIS (trained with no video masks)

Instance Segmentation Optical Flow Estimation +4

349

Paper
Code

TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction

2 code implementations • 7 Mar 2023 • Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu, Luc van Gool

We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving, and based on TrafficBots we obtain a world model tailored for the planning module of autonomous vehicles.

Autonomous Driving Model-based Reinforcement Learning +1

Paper
Code

A Multiplicative Value Function for Safe and Efficient Reinforcement Learning

1 code implementation • 7 Mar 2023 • Nick Bührer, Zhejun Zhang, Alexander Liniger, Fisher Yu, Luc van Gool

To this end, we propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.

Navigate reinforcement-learning +3

Paper
Code

Scaling Vision Transformers to 22 Billion Parameters

1 code implementation • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby

The scaling of Transformers has driven breakthrough capabilities for language models.

Ranked #1 on Zero-Shot Transfer Image Classification on ObjectNet

Action Classification Fairness +3

192

Paper
Code

Uncertainty-Driven Dense Two-View Structure from Motion

no code implementations • 1 Feb 2023 • Weirong Chen, Suryansh Kumar, Fisher Yu

This work introduces an effective and practical solution to the dense two-view structure from motion (SfM) problem.

Depth Estimation Optical Flow Estimation +2

Paper
Add Code

BiBench: Benchmarking and Analyzing Network Binarization

1 code implementation • 26 Jan 2023 • Haotong Qin, Mingyuan Zhang, Yifu Ding, Aoyu Li, Zhongang Cai, Ziwei Liu, Fisher Yu, Xianglong Liu

Network binarization emerges as one of the most promising compression approaches offering extraordinary computation and memory savings by minimizing the bit-width.

Benchmarking Binarization

Paper
Code

3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection

1 code implementation • ICCV 2023 • Changyong Shu, Jiajun Deng, Fisher Yu, Yifan Liu

Although 3D measurements are not available at the inference time of monocular 3D object detection, 3DPPE uses predicted depth to approximate the real point positions.

Monocular 3D Object Detection object-detection

Paper
Code

CC-3DT: Panoramic 3D Object Tracking via Cross-Camera Fusion

no code implementations • 2 Dec 2022 • Tobias Fischer, Yung-Hsu Yang, Suryansh Kumar, Min Sun, Fisher Yu

To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings.

3D Object Tracking Autonomous Vehicles +2

Paper
Add Code

3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers

1 code implementation • 27 Nov 2022 • Changyong Shu, Jiajun Deng, Fisher Yu, Yifan Liu

Although 3D measurements are not available at the inference time of monocular 3D object detection, 3DPPE uses predicted depth to approximate the real point positions.

Monocular 3D Object Detection Monocular Depth Estimation +1

Paper
Code

Unifying Flow, Stereo and Depth Estimation

1 code implementation • 10 Nov 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images.

Ranked #1 on Optical Flow Estimation on Sintel-clean

Optical Flow Estimation Stereo Depth Estimation +1

882

Paper
Code

Normalization Perturbation: A Simple Domain Generalization Method for Real-World Domain Shifts

no code implementations • 8 Nov 2022 • Qi Fan, Mattia Segu, Yu-Wing Tai, Fisher Yu, Chi-Keung Tang, Bernt Schiele, Dengxin Dai

Thus, we propose to perturb the channel statistics of source domain features to synthesize various latent styles, so that the trained deep model can perceive diverse potential domains and generalizes well even without observations of target domain data in training.

Autonomous Driving Domain Generalization

Paper
Add Code

Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing

no code implementations • 26 Oct 2022 • Jiawei Fu, Yunlong Song, Yan Wu, Fisher Yu, Davide Scaramuzza

The resulting policy directly infers control commands with feature representations learned from raw images, forgoing the need for globally-consistent state estimation, trajectory planning, and handcrafted control design.

Contrastive Learning Trajectory Planning

Paper
Add Code

Composite Learning for Robust and Effective Dense Predictions

no code implementations • 13 Oct 2022 • Menelaos Kanakis, Thomas E. Huang, David Bruggemann, Fisher Yu, Luc van Gool

In this paper, we find that jointly training a dense prediction (target) task with a self-supervised (auxiliary) task can consistently improve the performance of the target task, while eliminating the need for labeling auxiliary tasks.

Ranked #102 on Semantic Segmentation on NYU Depth v2

Boundary Detection Monocular Depth Estimation +3

Paper
Add Code

QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking

2 code implementations • 12 Oct 2022 • Tobias Fischer, Thomas E. Huang, Jiangmiao Pang, Linlu Qiu, Haofeng Chen, Trevor Darrell, Fisher Yu

In this paper, we present Quasi-Dense Similarity Learning, which densely samples hundreds of object regions on a pair of images for contrastive learning.

Ranked #4 on Multiple Object Tracking on BDD100K test

Contrastive Learning Multiple Object Tracking +1

377

Paper
Code

Fast Hierarchical Learning for Few-Shot Object Detection

no code implementations • 10 Oct 2022 • Yihang She, Goutam Bhat, Martin Danelljan, Fisher Yu

These approaches however suffer from ``catastrophic forgetting'' issue due to finetuning of base detector, leading to sub-optimal performance on the base classes.

Few-Shot Object Detection Object +2

Paper
Add Code

Uncertainty Guided Policy for Active Robotic 3D Reconstruction using Neural Radiance Fields

no code implementations • 17 Sep 2022 • Soomin Lee, Le Chen, Jiahao Wang, Alexander Liniger, Suryansh Kumar, Fisher Yu

In this paper, we tackle the problem of active robotic 3D reconstruction of an object.

3D Reconstruction

Paper
Add Code

Spatio-Temporal Action Detection Under Large Motion

no code implementations • 6 Sep 2022 • Gurkirt Singh, Vasileios Choutas, Suman Saha, Fisher Yu, Luc van Gool

Current methods for spatiotemporal action tube detection often extend a bounding box proposal at a given keyframe into a 3D temporal cuboid and pool features from nearby frames.

Action Detection

Paper
Add Code

Video Mask Transfiner for High-Quality Video Instance Segmentation

1 code implementation • 28 Jul 2022 • Lei Ke, Henghui Ding, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

While Video Instance Segmentation (VIS) has seen rapid progress, current approaches struggle to predict high-quality masks with accurate boundary details.

Ranked #1 on Video Instance Segmentation on HQ-YTVIS

Instance Segmentation Semantic Segmentation +2

Paper
Code

Tracking Every Thing in the Wild

1 code implementation • 26 Jul 2022 • Siyuan Li, Martin Danelljan, Henghui Ding, Thomas E. Huang, Fisher Yu

Our experiments show that TETA evaluates trackers more comprehensively, and TETer achieves significant improvements on the challenging large-scale datasets BDD100K and TAO compared to the state-of-the-art.

Ranked #4 on Multi-Object Tracking on TAO

Benchmarking Classification +2

Paper
Code

SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation

1 code implementation • CVPR 2022 • Tao Sun, Mattia Segu, Janis Postels, Yuxuan Wang, Luc van Gool, Bernt Schiele, Federico Tombari, Fisher Yu

Adapting to a continuously evolving environment is a safety-critical challenge inevitably faced by all autonomous driving systems.

Autonomous Driving Domain Adaptation

Paper
Code

Learning Online Multi-Sensor Depth Fusion

1 code implementation • 7 Apr 2022 • Erik Sandström, Martin R. Oswald, Suryansh Kumar, Silvan Weder, Fisher Yu, Cristian Sminchisescu, Luc van Gool

Multi-sensor depth fusion is able to substantially improve the robustness and accuracy of 3D reconstruction methods, but existing techniques are not robust enough to handle sensors which operate with diverse value ranges as well as noise and outlier statistics.

3D Reconstruction Mixed Reality +1

Paper
Code

LiDAR Snowfall Simulation for Robust 3D Object Detection

1 code implementation • CVPR 2022 • Martin Hahner, Christos Sakaridis, Mario Bijelic, Felix Heide, Fisher Yu, Dengxin Dai, Luc van Gool

Due to the difficulty of collecting and annotating training data in this setting, we propose a physically based method to simulate the effect of snowfall on real clear-weather LiDAR point clouds.

Ranked #1 on 3D Object Detection on Heavy Snowfall

Autonomous Driving Object +3

162

Paper
Code

Transforming Model Prediction for Tracking

1 code implementation • CVPR 2022 • Christoph Mayer, Martin Danelljan, Goutam Bhat, Matthieu Paul, Danda Pani Paudel, Fisher Yu, Luc van Gool

Optimization based tracking methods have been widely successful by integrating a target model prediction module, providing effective global reasoning by minimizing an objective function.

Ranked #19 on Visual Object Tracking on LaSOT (Precision metric)

Inductive Bias Visual Object Tracking

3,076

Paper
Code

Generative Cooperative Learning for Unsupervised Video Anomaly Detection

no code implementations • CVPR 2022 • Muhammad Zaigham Zaheer, Arif Mahmood, Muhammad Haris Khan, Mattia Segu, Fisher Yu, Seung-Ik Lee

Video anomaly detection is well investigated in weakly-supervised and one-class classification (OCC) settings.

One-Class Classification Video Anomaly Detection

Paper
Add Code

Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences

1 code implementation • CVPR 2022 • Prune Truong, Martin Danelljan, Fisher Yu, Luc van Gool

We propose Probabilistic Warp Consistency, a weakly-supervised learning objective for semantic matching.

Weakly-supervised Learning

614

Paper
Code

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

3 code implementations • CVPR 2022 • Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc van Gool

In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks.

Denoising Image Inpainting

10,809

Paper
Code

SAGA: Stochastic Whole-Body Grasping with Contact

1 code implementation • 19 Dec 2021 • Yan Wu, Jiahao Wang, Yan Zhang, Siwei Zhang, Otmar Hilliges, Fisher Yu, Siyu Tang

Given an initial pose and the generated whole-body grasping pose as the start and end of the motion respectively, we design a novel contact-aware generative motion infilling module to generate a diverse set of grasp-oriented motions.

Object

Paper
Code

Mask Transfiner for High-Quality Instance Segmentation

2 code implementations • CVPR 2022 • Lei Ke, Martin Danelljan, Xia Li, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

Instead of operating on regular dense tensors, our Mask Transfiner decomposes and represents the image regions as a quadtree.

Ranked #1 on Instance Segmentation on BDD100K val

Instance Segmentation Segmentation +2

519

Paper
Code

Normalizing Flow as a Flexible Fidelity Objective for Photo-Realistic Super-resolution

no code implementations • 5 Nov 2021 • Andreas Lugmayr, Martin Danelljan, Fisher Yu, Luc van Gool, Radu Timofte

Super-resolution is an ill-posed problem, where a ground-truth high-resolution image represents only one possibility in the space of plausible solutions.

Super-Resolution

Paper
Add Code

Dense Prediction with Attentive Feature Aggregation

no code implementations • 1 Nov 2021 • Yung-Hsu Yang, Thomas E. Huang, Min Sun, Samuel Rota Bulò, Peter Kontschieder, Fisher Yu

Our experiments show consistent and significant improvements on challenging semantic segmentation benchmarks, including Cityscapes, BDD100K, and Mapillary Vistas, at negligible computational and parameter overhead.

Boundary Detection Semantic Segmentation

Paper
Add Code

TACS: Taxonomy Adaptive Cross-Domain Semantic Segmentation

1 code implementation • 10 Sep 2021 • Rui Gong, Martin Danelljan, Dengxin Dai, Danda Pani Paudel, Ajad Chhatkuli, Fisher Yu, Luc van Gool

In many real-world settings, the target domain task requires a different taxonomy than the one imposed by the source domain.

Contrastive Learning Domain Adaptation +1

Paper
Code

End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

2 code implementations • ICCV 2021 • Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu, Luc van Gool

Our end-to-end agent achieves a 78% success rate while generalizing to a new town and new weather on the NoCrash-dense benchmark and state-of-the-art performance on the challenging public routes of the CARLA LeaderBoard.

Autonomous Driving Imitation Learning +2

241

Paper
Code

Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

2 code implementations • ICCV 2021 • Goutam Bhat, Martin Danelljan, Fisher Yu, Luc van Gool, Radu Timofte

The deep reparametrization allows us to directly model the image formation process in the latent space, and to integrate learned image priors into the prediction.

Ranked #4 on Burst Image Super-Resolution on BurstSR

Burst Image Super-Resolution Denoising +2

169

Paper
Code

On the Practicality of Deterministic Epistemic Uncertainty

2 code implementations • 1 Jul 2021 • Janis Postels, Mattia Segu, Tao Sun, Luca Sieber, Luc van Gool, Fisher Yu, Federico Tombari

We find that, while DUMs scale to realistic vision tasks and perform well on OOD detection, the practicality of current methods is undermined by poor calibration under distributional shifts.

Out of Distribution (OOD) Detection Semantic Segmentation +1

1,359

Paper
Code

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

1 code implementation • NeurIPS 2021 • Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.

Ranked #1 on Video Instance Segmentation on BDD100K val

Multi-Object Tracking and Segmentation Multiple Object Track and Segmentation +3

358

Paper
Code

Robust Object Detection via Instance-Level Temporal Cycle Confusion

1 code implementation • ICCV 2021 • Xin Wang, Thomas E. Huang, Benlin Liu, Fisher Yu, Xiaolong Wang, Joseph E. Gonzalez, Trevor Darrell

Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real-world applications.

Object object-detection +2

Paper
Code

Warp Consistency for Unsupervised Learning of Dense Correspondences

1 code implementation • ICCV 2021 • Prune Truong, Martin Danelljan, Fisher Yu, Luc van Gool

From our observations and empirical results, we design a general unsupervised objective employing two of the derived constraints.

Dense Pixel Correspondence Estimation

614

Paper
Code

Monocular Quasi-Dense 3D Object Tracking

1 code implementation • 12 Mar 2021 • Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun

Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios.

Ranked #7 on Multiple Object Tracking on KITTI Tracking test

3D Object Tracking Autonomous Driving +3

504

Paper
Code

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

5 code implementations • ICCV 2021 • Wenguan Wang, Tianfei Zhou, Fisher Yu, Jifeng Dai, Ender Konukoglu, Luc van Gool

Inspired by the recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.

Metric Learning Optical Character Recognition (OCR) +3

8,218

Paper
Code

Instance-Aware Predictive Navigation in Multi-Agent Environments

1 code implementation • 14 Jan 2021 • Jinkun Cao, Xin Wang, Trevor Darrell, Fisher Yu

To decide the action at each step, we seek the action sequence that can lead to safe future states based on the prediction module outputs by repeatedly sampling likely action sequences.

Paper
Code

Quasi-Dense Similarity Learning for Multiple Object Tracking

3 code implementations • CVPR 2021 • Jiangmiao Pang, Linlu Qiu, Xia Li, Haofeng Chen, Qi Li, Trevor Darrell, Fisher Yu

Compared to methods with similar detectors, it boosts almost 10 points of MOTA and significantly decreases the number of ID switches on BDD100K and Waymo datasets.

Ranked #1 on One-Shot Object Detection on PASCAL VOC 2012 val

Contrastive Learning Metric Learning +4

377

Paper
Code

Frustratingly Simple Few-Shot Object Detection

5 code implementations • ICML 2020 • Xin Wang, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, Fisher Yu

Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods.

Ranked #17 on Few-Shot Object Detection on MS-COCO (30-shot)

Few-Shot Object Detection Meta-Learning +2

1,057

Paper
Code

Task-Aware Feature Generation for Zero-Shot Compositional Learning

1 code implementation • 11 Jun 2019 • Xin Wang, Fisher Yu, Trevor Darrell, Joseph E. Gonzalez

In this work, we propose a task-aware feature generation (TFG) framework for compositional learning, which generates features of novel visual concepts by transferring knowledge from previously seen concepts.

Novel Concepts Zero-Shot Learning

Paper
Code

TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning

1 code implementation • CVPR 2019 • Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, Joseph E. Gonzalez

We show that TAFE-Net is highly effective in generalizing to new tasks or concepts and evaluate the TAFE-Net on a range of benchmarks in zero-shot and few-shot learning.

Ranked #1 on Few-Shot Image Classification on aPY - 0-Shot

Attribute Few-Shot Learning +1

Paper
Code

Hierarchical Discrete Distribution Decomposition for Match Density Estimation

2 code implementations • CVPR 2019 • Zhichao Yin, Trevor Darrell, Fisher Yu

Explicit representations of the global match distributions of pixel-wise correspondences between pairs of images are desirable for uncertainty estimation and downstream applications.

Ranked #13 on Optical Flow Estimation on KITTI 2015 (train)

Density Estimation Optical Flow Estimation +2

202

Paper
Code

Few-shot Object Detection via Feature Reweighting

4 code implementations • ICCV 2019 • Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell

The feature learner extracts meta features that are generalizable to detect novel object classes, using training data from base classes with sufficient samples.

Ranked #21 on Few-Shot Object Detection on MS-COCO (30-shot)

Few-Shot Learning Few-Shot Object Detection +3

518

Paper
Code

Disentangling Propagation and Generation for Video Prediction

1 code implementation • ICCV 2019 • Hang Gao, Huazhe Xu, Qi-Zhi Cai, Ruth Wang, Fisher Yu, Trevor Darrell

A dynamic scene has two types of elements: those that move fluidly and can be predicted from previous frames, and those which are disoccluded (exposed) and cannot be extrapolated.

Predict Future Video Frames

Paper
Code

Joint Monocular 3D Vehicle Detection and Tracking

1 code implementation • ICCV 2019 • Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu

The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.

Ranked #12 on Multiple Object Tracking on KITTI Tracking test

3D Object Detection 3D Pose Estimation +4

652

Paper
Code

Deep Object-Centric Policies for Autonomous Driving

no code implementations • 13 Nov 2018 • Dequan Wang, Coline Devin, Qi-Zhi Cai, Fisher Yu, Trevor Darrell

While learning visuomotor skills in an end-to-end manner is appealing, deep neural networks are often uninterpretable and fail in surprising ways.

Autonomous Driving Object

Paper
Add Code

Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation

no code implementations • ECCV 2018 • Chaowei Xiao, Ruizhi Deng, Bo Li, Fisher Yu, Mingyan Liu, Dawn Song

In this paper, we aim to characterize adversarial examples based on spatial context information in semantic segmentation.

General Classification Segmentation +1

Paper
Add Code

Deep Mixture of Experts via Shallow Embedding

no code implementations • 5 Jun 2018 • Xin Wang, Fisher Yu, Lisa Dunlap, Yi-An Ma, Ruth Wang, Azalia Mirhoseini, Trevor Darrell, Joseph E. Gonzalez

Larger networks generally have greater representational power at the cost of increased computational complexity.

Few-Shot Learning Zero-Shot Learning

Paper
Add Code

PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup

no code implementations • CVPR 2018 • Huiwen Chang, Jingwan Lu, Fisher Yu, Adam Finkelstein

This paper introduces an automatic method for editing a portrait photo so that the subject appears to be wearing makeup in the style of another person in a reference photo.

Style Transfer

Paper
Add Code

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

4 code implementations • CVPR 2020 • Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, Trevor Darrell

Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving.

Ranked #5 on Multiple Object Tracking on BDD100K test

Autonomous Driving Domain Adaptation +8

395

Paper
Code

Reinforcement Learning from Imperfect Demonstrations

no code implementations • ICLR 2018 • Yang Gao, Huazhe Xu, Ji Lin, Fisher Yu, Sergey Levine, Trevor Darrell

We propose a unified reinforcement learning algorithm, Normalized Actor-Critic (NAC), that effectively normalizes the Q-function, reducing the Q-values of actions unseen in the demonstration data.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

SkipNet: Learning Dynamic Routing in Convolutional Networks

2 code implementations • ECCV 2018 • Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, Joseph E. Gonzalez

While deeper convolutional networks are needed to achieve maximum accuracy in visual perception tasks, for many inputs shallower networks are sufficient.

Decision Making

233

Paper
Code

Deep Layer Aggregation

7 code implementations • CVPR 2018 • Fisher Yu, Dequan Wang, Evan Shelhamer, Trevor Darrell

We augment standard architectures with deeper aggregation to better fuse information across layers.

Image Classification

29,648

Paper
Code

Interactive 3D Modeling with a Generative Adversarial Network

no code implementations • 16 Jun 2017 • Jerry Liu, Fisher Yu, Thomas Funkhouser

This paper proposes the idea of using a generative adversarial network (GAN) to assist a novice user in designing real-world shapes with a simple interface.

Generative Adversarial Network

Paper
Add Code

TextureGAN: Controlling Deep Image Synthesis with Texture Patches

2 code implementations • CVPR 2018 • Wenqi Xian, Patsorn Sangkloy, Varun Agrawal, Amit Raj, Jingwan Lu, Chen Fang, Fisher Yu, James Hays

In this paper, we investigate deep image synthesis guided by sketch, color, and texture.

Ranked #2 on Image Reconstruction on Edge-to-Shoes

Image Generation Texture Synthesis

163

Paper
Code

IDK Cascades: Fast Deep Learning by Learning not to Overthink

no code implementations • 3 Jun 2017 • Xin Wang, Yujia Luo, Daniel Crankshaw, Alexey Tumanov, Fisher Yu, Joseph E. Gonzalez

Advances in deep learning have led to substantial increases in prediction accuracy but have been accompanied by increases in the cost of rendering predictions.

Dialogue Generation

Paper
Add Code

Dilated Residual Networks

3 code implementations • CVPR 2017 • Fisher Yu, Vladlen Koltun, Thomas Funkhouser

Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernible.

Classification General Classification +4

2,917

Paper
Code

FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation

3 code implementations • 8 Dec 2016 • Judy Hoffman, Dequan Wang, Fisher Yu, Trevor Darrell

In this paper, we introduce the first domain adaptive semantic segmentation method, proposing an unsupervised adversarial approach to pixel prediction problems.

Ranked #2 on Image-to-Image Translation on SYNTHIA Fall-to-Winter

Semantic Segmentation Synthetic-to-Real Translation

Paper
Code

End-to-end Learning of Driving Models from Large-scale Video Datasets

2 code implementations • CVPR 2017 • Huazhe Xu, Yang Gao, Fisher Yu, Trevor Darrell

Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or a simulation environment.

Scene Segmentation

217

Paper
Code

Scribbler: Controlling Deep Image Synthesis with Sketch and Color

1 code implementation • CVPR 2017 • Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, James Hays

In this paper, we propose a deep adversarial image synthesis architecture that is conditioned on sketched boundaries and sparse color strokes to generate realistic cars, bedrooms, or faces.

Colorization Image Generation

Paper
Code

Semantic Scene Completion from a Single Depth Image

3 code implementations • CVPR 2017 • Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser

This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation.

Ranked #2 on 3D Semantic Scene Completion on KITTI-360

3D Semantic Scene Completion

1,177

Paper
Code

ShapeNet: An Information-Rich 3D Model Repository

14 code implementations • 9 Dec 2015 • Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qi-Xing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, Fisher Yu

We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects.

Data Visualization

65,339

Paper
Code

Multi-Scale Context Aggregation by Dilated Convolutions

8 code implementations • 23 Nov 2015 • Fisher Yu, Vladlen Koltun

State-of-the-art models for semantic segmentation are based on adaptations of convolutional networks that had originally been designed for image classification.

Ranked #11 on Semantic Segmentation on CamVid

General Classification Real-Time Semantic Segmentation +1

773

Paper
Code

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

4 code implementations • 10 Jun 2015 • Fisher Yu, Ari Seff, yinda zhang, Shuran Song, Thomas Funkhouser, Jianxiong Xiao

While there has been remarkable progress in the performance of visual recognition algorithms, the state-of-the-art models tend to be exceptionally data-hungry.

517

Paper
Code

Semantic Alignment of LiDAR Data at City Scale

no code implementations • CVPR 2015 • Fisher Yu, Jianxiong Xiao, Thomas Funkhouser

This paper describes an automatic algorithm for global alignment of LiDAR data collected with Google Street View cars in urban environments.

Pose Estimation

Paper
Add Code

3D ShapeNets: A Deep Representation for Volumetric Shapes

no code implementations • CVPR 2015 • Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao

Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses from raw CAD data, and discovers hierarchical compositional part representations automatically.

Ranked #35 on 3D Point Cloud Classification on ModelNet40 (Mean Accuracy metric)

3D Point Cloud Classification 3D Shape Representation +2

Paper
Add Code

3D Reconstruction from Accidental Motion

no code implementations • CVPR 2014 • Fisher Yu, David Gallup

We have discovered that 3D reconstruction can be achieved from asingle still photographic capture due to accidental motions of thephotographer, even while attempting to hold the camera still.

3D Reconstruction Semantic Segmentation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.