Search Results for author: Derek Hoiem

Found 47 papers, 19 papers with code

MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

no code implementations • 12 Apr 2024 • Yuqun Wu, Jae Yong Lee, Chuhang Zou, Shenlong Wang, Derek Hoiem

Our experiments show 4x the performance of RegNeRF and 8x that of FreeNeRF on average F1@2cm for ETH3D MVS benchmark, suggesting a fruitful research direction to improve the geometric accuracy of NeRF-based models, and sheds light on a potential future approach to enable NeRF-based optimization to eventually outperform traditional MVS.

Novel View Synthesis SSIM

Paper
Add Code

Region-Based Representations Revisited

no code implementations • 4 Feb 2024 • Michal Shlapentokh-Rothman, Ansel Blume, Yao Xiao, Yuqun Wu, Sethuraman T V, Heyi Tao, Jae Yong Lee, Wilfredo Torres, Yu-Xiong Wang, Derek Hoiem

We investigate whether region-based representations are effective for recognition.

Image Retrieval Retrieval +1

Paper
Add Code

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

no code implementations • 28 Dec 2023 • Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi

We present Unified-IO 2, the first autoregressive multimodal model that is capable of understanding and generating image, text, audio, and action.

Image Generation Natural Language Understanding

Paper
Add Code

ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation

1 code implementation • 22 Nov 2023 • Yangyi Chen, Xingyao Wang, Manling Li, Derek Hoiem, Heng Ji

We adopt a weakly-supervised approach to directly generate visual event structures from captions for ViStruct training, capitalizing on abundant image-caption pairs from the web.

Paper
Code

WebWISE: Web Interface Control and Sequential Exploration with Large Language Models

no code implementations • 24 Oct 2023 • Heyi Tao, Sethuraman T V, Michal Shlapentokh-Rothman, Derek Hoiem

The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations.

Imitation Learning In-Context Learning +3

Paper
Add Code

Consistent Multimodal Generation via A Unified GAN Framework

no code implementations • 4 Jul 2023 • Zhen Zhu, Yijun Li, Weijie Lyu, Krishna Kumar Singh, Zhixin Shu, Soeren Pirk, Derek Hoiem

We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model.

multimodal generation

Paper
Add Code

Continual Learning in Open-vocabulary Classification with Complementary Memory Systems

no code implementations • 4 Jul 2023 • Zhen Zhu, Weijie Lyu, Yao Xiao, Derek Hoiem

We introduce a method for flexible and efficient continual learning in open-vocabulary image classification, drawing inspiration from the complementary learning systems observed in human cognition.

Continual Learning Image Classification

Paper
Add Code

Make It So: Steering StyleGAN for Any Image Inversion and Editing

no code implementations • 27 Apr 2023 • Anand Bhattad, Viraj Shah, Derek Hoiem, D. A. Forsyth

StyleGAN's disentangled style representation enables powerful image editing by manipulating the latent variables, but accurately mapping real-world images to their latent variables (GAN inversion) remains a challenge.

Paper
Add Code

Sparse SPN: Depth Completion from Sparse Keypoints

no code implementations • 2 Dec 2022 • Yuqun Wu, Jae Yong Lee, Derek Hoiem

Our long term goal is to use image-based depth completion to quickly create 3D models from sparse point clouds, e. g. from SfM or SLAM.

Depth Completion

Paper
Add Code

QFF: Quantized Fourier Features for Neural Field Representations

no code implementations • 2 Dec 2022 • Jae Yong Lee, Yuqun Wu, Chuhang Zou, Shenlong Wang, Derek Hoiem

Instead, we propose to encode features in bins of Fourier features that are commonly used for positional encoding.

Paper
Add Code

Deep PatchMatch MVS with Learned Patch Coplanarity, Geometric Consistency and Adaptive Pixel Sampling

no code implementations • 14 Oct 2022 • Jae Yong Lee, Chuhang Zou, Derek Hoiem

Recent work in multi-view stereo (MVS) combines learnable photometric scores and regularization with PatchMatch-based optimization to achieve robust pixelwise estimates of depth, normals, and visibility.

Paper
Add Code

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

1 code implementation • 22 May 2022 • Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, ZiYi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji

The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction.

Attribute Automatic Speech Recognition +6

110

Paper
Code

GRIT: General Robust Image Task Benchmark

1 code implementation • 28 Apr 2022 • Tanmay Gupta, Ryan Marten, Aniruddha Kembhavi, Derek Hoiem

Computer vision models excel at making predictions when the test distribution closely resembles the training distribution.

Instance Segmentation Keypoint Detection +7

Paper
Code

Webly Supervised Concept Expansion for General Purpose Vision Models

no code implementations • 4 Feb 2022 • Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi

This work presents an effective and inexpensive alternative: learn skills from supervised datasets, learn concepts from web image search, and leverage a key characteristic of GPVs: the ability to transfer visual knowledge across skills.

Ranked #2 on Visual Question Answering (VQA) on GRIT

Human-Object Interaction Detection Image Retrieval +4

Paper
Add Code

Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture

no code implementations • CVPR 2022 • Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.

Question Answering Visual Question Answering

Paper
Add Code

PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility

1 code implementation • ICCV 2021 • Jae Yong Lee, Joseph DeGol, Chuhang Zou, Derek Hoiem

To overcome the challenge of the non-differentiable PatchMatch optimization that involves iterative sampling and hard decisions, we use reinforcement learning to minimize expected photometric cost and maximize likelihood of ground truth depth and normals.

Paper
Code

Towards General Purpose Vision Systems

2 code implementations • 1 Apr 2021 • Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

Question Answering Visual Question Answering

294

Paper
Code

Learning Curves for Analysis of Deep Networks

1 code implementation • 21 Oct 2020 • Derek Hoiem, Tanmay Gupta, Zhizhong Li, Michal M. Shlapentokh-Rothman

Learning curves model a classifier's test error as a function of the number of training samples.

Data Augmentation Image Classification

Paper
Code

Contrastive Learning for Weakly Supervised Phrase Grounding

1 code implementation • ECCV 2020 • Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem

Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions.

Contrastive Learning Language Modelling +1

Paper
Code

Boundary Cues for 3D Object Shape Recovery

no code implementations • CVPR 2013 • Kevin Karsch, Zicheng Liao, Jason Rock, Jonathan T. Barron, Derek Hoiem

Early work in computer vision considered a host of geometric cues for both shape reconstruction and recognition.

Object

Paper
Add Code

Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion

2 code implementations • CVPR 2020 • Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose M. Alvarez, Arun Mallya, Derek Hoiem, Niraj K. Jha, Jan Kautz

We introduce DeepInversion, a new method for synthesizing images from the image distribution used to train a deep neural network.

Continual Learning Network Pruning +1

474

Paper
Code

Manhattan Room Layout Reconstruction from a Single 360 image: A Comparative Study of State-of-the-art Methods

3 code implementations • 9 Oct 2019 • Chuhang Zou, Jheng-Wei Su, Chi-Han Peng, Alex Colburn, Qi Shan, Peter Wonka, Hung-Kuo Chu, Derek Hoiem

Recent approaches for predicting layouts from 360 panoramas produce excellent results.

Semantic Segmentation

309

Paper
Code

ViCo: Word Embeddings from Visual Co-occurrences

1 code implementation • ICCV 2019 • Tanmay Gupta, Alexander Schwing, Derek Hoiem

Through unsupervised clustering, supervised partitioning, and a zero-shot-like generalization analysis we show that our word embeddings complement text-only embeddings like GloVe by better representing similarities and differences between visual concepts that are difficult to obtain from text corpora alone.

Attribute Clustering +1

Paper
Code

Task-Assisted Domain Adaptation with Anchor Tasks

no code implementations • 16 Aug 2019 • Zhizhong Li, Linjie Luo, Sergey Tulyakov, Qieyun Dai, Derek Hoiem

Our key idea to improve domain adaptation is to introduce a separate anchor task (such as facial landmarks) whose annotations can be obtained at no cost or are already available on both synthetic and real datasets.

Depth Estimation Domain Adaptation +2

Paper
Add Code

Silhouette Guided Point Cloud Reconstruction beyond Occlusion

1 code implementation • 29 Jul 2019 • Chuhang Zou, Derek Hoiem

One major challenge in 3D reconstruction is to infer the complete shape geometry from partial foreground occlusions.

Point cloud reconstruction

Paper
Code

Reducing Overconfident Errors outside the Known Distribution

no code implementations • ICLR 2019 • Zhizhong Li, Derek Hoiem

We compare a number of methods from related fields such as calibration and epistemic uncertainty modeling, as well as two proposed methods that reduce overconfident errors of samples from an unknown novel distribution without drastically increasing evaluation time: (1) G-distillation, training an ensemble of classifiers and then distill into a single model using both labeled and unlabeled examples, or (2) NCR, reducing prediction confidence based on its novelty detection score.

Domain Adaptation Novelty Detection

Paper
Add Code

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques

3 code implementations • ICCV 2019 • Tanmay Gupta, Alexander Schwing, Derek Hoiem

We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches.

Human-Object Interaction Detection Object

Paper
Code

Improved Structure from Motion Using Fiducial Marker Matching

no code implementations • ECCV 2018 • Joseph DeGol, Timothy Bretl, Derek Hoiem

In this paper, we present an incremental structure from motion (SfM) algorithm that signiï¬cantly outperforms existing algorithms when ï¬ducial markers are present in the scene, and that matches the performance of existing algorithms when no markers are present.

Paper
Add Code

Pixels, voxels, and views: A study of shape representations for single view 3D object shape prediction

no code implementations • CVPR 2018 • Daeyun Shin, Charless C. Fowlkes, Derek Hoiem

The goal of this paper is to compare surface-based and volumetric 3D object shape representations, as well as viewer-centered and object-centered reference frames for single-view 3D shape prediction.

Object

Paper
Add Code

Imagine This! Scripts to Compositions to Videos

5 code implementations • ECCV 2018 • Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi

Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge.

Retrieval World Knowledge

Paper
Code

Improving Confidence Estimates for Unfamiliar Examples

1 code implementation • CVPR 2020 • Zhizhong Li, Derek Hoiem

In this paper, we compare and evaluate several methods to improve confidence estimates for unfamiliar and familiar samples.

Attribute Domain Adaptation

Paper
Code

LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image

2 code implementations • CVPR 2018 • Chuhang Zou, Alex Colburn, Qi Shan, Derek Hoiem

We propose an algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e. g. L-shape room).

Ranked #2 on 3D Room Layouts From A Single RGB Panorama on Realtor360

3D Room Layouts From A Single RGB Panorama Translation

213

Paper
Code

Complete 3D Scene Parsing from an RGBD Image

1 code implementation • 25 Oct 2017 • Chuhang Zou, Ruiqi Guo, Zhizhong Li, Derek Hoiem

In this paper, we aim to interpret indoor scenes from one RGBD image.

Retrieval Scene Parsing +1

Paper
Code

ChromaTag: A Colored Marker and Fast Detection Algorithm

no code implementations • ICCV 2017 • Joseph DeGol, Timothy Bretl, Derek Hoiem

Current fiducial marker detection algorithms rely on marker IDs for false positive rejection.

Robot Navigation TAG

Paper
Add Code

3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks

2 code implementations • ICCV 2017 • Chuhang Zou, Ersin Yumer, Jimei Yang, Duygu Ceylan, Derek Hoiem

The success of various applications including robotics, digital content creation, and visualization demand a structured and abstract representation of the 3D world from limited sensor data.

Retrieval

118

Paper
Code

Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

no code implementations • ICCV 2017 • Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem

In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning.

Multi-Task Learning Question Answering +1

Paper
Add Code

Geometry-Informed Material Recognition

no code implementations • CVPR 2016 • Joseph DeGol, Mani Golparvar-Fard, Derek Hoiem

Our goal is to recognize material categories using images and geometry information.

General Classification Management +2

Paper
Add Code

Learning without Forgetting

10 code implementations • 29 Jun 2016 • Zhizhong Li, Derek Hoiem

We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities.

Ranked #4 on Domain 11-5 on Cityscapes

Class Incremental Learning Disjoint 10-1 +9

1,663

Paper
Code

3DFS: Deformable Dense Depth Fusion and Segmentation for Object Reconstruction from a Handheld Camera

no code implementations • 15 Jun 2016 • Tanmay Gupta, Daeyun Shin, Naren Sivagnanadasan, Derek Hoiem

The resulting depth maps are then fused using a proposed implicit surface function that is robust to estimation error, producing a smooth surface reconstruction of the entire scene.

3D Reconstruction Depth Estimation +4

Paper
Add Code

Learning to Localize Little Landmarks

no code implementations • CVPR 2016 • Saurabh Singh, Derek Hoiem, David Forsyth

We describe a method to find such landmarks by finding a sequence of latent landmarks, each with a prediction model.

Paper
Add Code

Swapout: Learning an ensemble of deep architectures

no code implementations • NeurIPS 2016 • Saurabh Singh, Derek Hoiem, David Forsyth

When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, similar to dropout, but also across network layers.

Paper
Add Code

Where To Look: Focus Regions for Visual Question Answering

no code implementations • CVPR 2016 • Kevin J. Shih, Saurabh Singh, Derek Hoiem

We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query.

Question Answering Visual Question Answering

Paper
Add Code

Part Localization using Multi-Proposal Consensus for Fine-Grained Categorization

no code implementations • 22 Jul 2015 • Kevin J. Shih, Arun Mallya, Saurabh Singh, Derek Hoiem

We present a simple deep learning framework to simultaneously predict keypoint locations and their respective visibilities and use those to achieve state-of-the-art performance for fine-grained classification.

General Classification