Search Results for author: Shalini De Mello

Found 39 papers, 18 papers with code

RegionGPT: Towards Region Understanding Vision Language Model

no code implementations • 4 Mar 2024 • Qiushan Guo, Shalini De Mello, Hongxu Yin, Wonmin Byeon, Ka Chun Cheung, Yizhou Yu, Ping Luo, Sifei Liu

Vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions.

Language Modelling

Paper
Add Code

What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs

no code implementations • 4 Jan 2024 • Alex Trevithick, Matthew Chan, Towaki Takikawa, Umar Iqbal, Shalini De Mello, Manmohan Chandraker, Ravi Ramamoorthi, Koki Nagano

3D-aware Generative Adversarial Networks (GANs) have shown remarkable progress in learning to generate multi-view-consistent images and 3D geometries of scenes from collections of 2D images via neural volume rendering.

Neural Rendering Super-Resolution

Paper
Add Code

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

no code implementations • 18 Dec 2023 • Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal

Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations.

Paper
Add Code

A Unified Approach for Text- and Image-guided 4D Scene Generation

no code implementations • 28 Nov 2023 • Yufeng Zheng, Xueting Li, Koki Nagano, Sifei Liu, Karsten Kreis, Otmar Hilliges, Shalini De Mello

Large-scale diffusion generative models are greatly simplifying image, video and 3D asset creation from user-provided text prompts and images.

Scene Generation

Paper
Add Code

3D Reconstruction with Generalizable Neural Fields using Scene Priors

no code implementations • 26 Sep 2023 • Yang Fu, Shalini De Mello, Xueting Li, Amey Kulkarni, Jan Kautz, Xiaolong Wang, Sifei Liu

NFP not only demonstrates SOTA scene reconstruction performance and efficiency, but it also supports single-image novel-view synthesis, which is underexplored in neural fields.

3D Reconstruction 3D Scene Reconstruction +1

Paper
Add Code

Investigation of Architectures and Receptive Fields for Appearance-based Gaze Estimation

1 code implementation • 18 Aug 2023 • Yunhan Wang, Xiangwei Shi, Shalini De Mello, Hyung Jin Chang, Xucong Zhang

With the rapid development of deep learning technology in the past decade, appearance-based gaze estimation has attracted great attention from both computer vision and human-computer interaction research communities.

Contrastive Learning Disentanglement +2

Paper
Code

Generalizable One-shot Neural Head Avatar

no code implementations • 14 Jun 2023 • Xueting Li, Shalini De Mello, Sifei Liu, Koki Nagano, Umar Iqbal, Jan Kautz

We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image.

Super-Resolution

Paper
Add Code

Zero-shot Pose Transfer for Unrigged Stylized 3D Characters

1 code implementation • CVPR 2023 • Jiashun Wang, Xueting Li, Sifei Liu, Shalini De Mello, Orazio Gallo, Xiaolong Wang, Jan Kautz

We present a zero-shot approach that requires only the widely available deformed non-stylized avatars in training, and deforms stylized characters of significantly different shapes at inference.

Pose Transfer

Paper
Code

Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos

no code implementations • 5 May 2023 • Ekta Prashnani, Koki Nagano, Shalini De Mello, David Luebke, Orazio Gallo

This allows us to link the synthetic video to the identity driving the expressions in the video, regardless of the facial appearance shown.

Paper
Add Code

Generative Novel View Synthesis with 3D-Aware Diffusion Models

no code implementations • ICCV 2023 • Eric R. Chan, Koki Nagano, Matthew A. Chan, Alexander W. Bergman, Jeong Joon Park, Axel Levy, Miika Aittala, Shalini De Mello, Tero Karras, Gordon Wetzstein

We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image.

Novel View Synthesis

Paper
Add Code

Affordance Diffusion: Synthesizing Hand-Object Interactions

no code implementations • CVPR 2023 • Yufei Ye, Xueting Li, Abhinav Gupta, Shalini De Mello, Stan Birchfield, Jiaming Song, Shubham Tulsiani, Sifei Liu

In contrast, in this work we focus on synthesizing complex interactions (ie, an articulated hand) with a given object.

Descriptive Image Generation +1

Paper
Add Code

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

1 code implementation • CVPR 2023 • Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello

Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks.

Ranked #2 on Open-World Instance Segmentation on UVO (using extra training data)

Open Vocabulary Panoptic Segmentation Open Vocabulary Semantic Segmentation +4

799

Paper
Code

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

2 code implementations • 13 Dec 2022 • Chenhongyi Yang, Jiarui Xu, Shalini De Mello, Elliot J. Crowley, Xiaolong Wang

In each GP Block, features are first grouped together by a fixed number of learnable group tokens; we then perform Group Propagation where global information is exchanged between the grouped features; finally, global information in the updated grouped features is returned back to the image features through a transformer decoder.

Image Classification Instance Segmentation +5

556

Paper
Code

GazeNeRF: 3D-Aware Gaze Redirection with Neural Radiance Fields

1 code implementation • CVPR 2023 • Alessandro Ruzzi, Xiangwei Shi, Xi Wang, Gengyan Li, Shalini De Mello, Hyung Jin Chang, Xucong Zhang, Otmar Hilliges

We propose GazeNeRF, a 3D-aware method for the task of gaze redirection.

gaze redirection

Paper
Code

CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

1 code implementation • CVPR 2022 • Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu

We represent the correspondence maps of different images as warped coordinate frames transformed from a canonical coordinate frame, i. e., the correspondence map, which describes the structure (e. g., the shape of a face), is controlled via a transformation.

Disentanglement

Paper
Code

FreeSOLO: Learning to Segment Objects without Annotations

1 code implementation • CVPR 2022 • Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez

FreeSOLO further demonstrates superiority as a strong pre-training method, outperforming state-of-the-art self-supervised pre-training methods by +9. 8% AP when fine-tuning instance segmentation with only 5% COCO masks.

Instance Segmentation object-detection +4

309

Paper
Code

GroupViT: Semantic Segmentation Emerges from Text Supervision

2 code implementations • CVPR 2022 • Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang

With only text supervision and without any pixel-level annotations, GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner, i. e., without any further fine-tuning.

Ranked #3 on Unsupervised Semantic Segmentation with Language-image Pre-training on PascalVOC-20

Object Detection Scene Understanding +3

124,889

Paper
Code

Efficient Geometry-aware 3D Generative Adversarial Networks

2 code implementations • CVPR 2022 • Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, Gordon Wetzstein

Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge.

Computational Efficiency Neural Rendering

3,120

Paper
Code

Learning Continuous Environment Fields via Implicit Functions

no code implementations • ICLR 2022 • Xueting Li, Shalini De Mello, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz, Sifei Liu

We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory.

Position Trajectory Prediction

Paper
Add Code

Self-Supervised Object Detection via Generative Image Synthesis

no code implementations • ICCV 2021 • Siva Karthik Mustikovela, Shalini De Mello, Aayush Prakash, Umar Iqbal, Sifei Liu, Thu Nguyen-Phuoc, Carsten Rother, Jan Kautz

We present SSOD, the first end-to-end analysis-by synthesis framework with controllable GANs for the task of self-supervised object detection.

Image Generation Object +2

Paper
Add Code

Learning Contrastive Representation for Semantic Correspondence

no code implementations • 22 Sep 2021 • Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz, Ming-Hsuan Yang

Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: 1) large variations in appearance, scale and pose exist even for objects from the same category, and 2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale.

Contrastive Learning Semantic correspondence

Paper
Add Code

Weakly-Supervised Physically Unconstrained Gaze Estimation

1 code implementation • CVPR 2021 • Rakshit Kothari, Shalini De Mello, Umar Iqbal, Wonmin Byeon, Seonwook Park, Jan Kautz

A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios.

Ranked #3 on Gaze Estimation on Gaze360

Domain Generalization Gaze Estimation

Paper
Code

Contrastive Syn-to-Real Generalization

2 code implementations • ICLR 2021 • Wuyang Chen, Zhiding Yu, Shalini De Mello, Sifei Liu, Jose M. Alvarez, Zhangyang Wang, Anima Anandkumar

Training on synthetic data can be beneficial for label or data-scarce scenarios.

Domain Generalization Inductive Bias

Paper
Code

Learning to Track Instances without Video Annotations

no code implementations • CVPR 2021 • Yang Fu, Sifei Liu, Umar Iqbal, Shalini De Mello, Humphrey Shi, Jan Kautz

Tracking segmentation masks of multiple instances has been intensively studied, but still faces two fundamental challenges: 1) the requirement of large-scale, frame-wise annotation, and 2) the complexity of two-stage approaches.

Instance Segmentation Pose Estimation +1

Paper
Add Code

Online Adaptation for Consistent Mesh Reconstruction in the Wild

no code implementations • NeurIPS 2020 • Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz

This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild.

3D Reconstruction

Paper
Add Code

Self-Learning Transformations for Improving Gaze and Head Redirection

2 code implementations • NeurIPS 2020 • Yufeng Zheng, Seonwook Park, Xucong Zhang, Shalini De Mello, Otmar Hilliges

Furthermore, we show that in the presence of limited amounts of real-world training data, our method allows for improvements in the downstream task of semi-supervised cross-dataset gaze estimation.

Disentanglement Gaze Estimation +1

Paper
Code

Self-Supervised Viewpoint Learning From Image Collections

2 code implementations • CVPR 2020 • Siva Karthik Mustikovela, Varun Jampani, Shalini De Mello, Sifei Liu, Umar Iqbal, Carsten Rother, Jan Kautz

Training deep neural networks to estimate the viewpoint of objects requires large labeled training datasets.

Object Viewpoint Estimation

214

Paper
Code

Self-supervised Single-view 3D Reconstruction via Semantic Consistency

1 code implementation • ECCV 2020 • Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, Jan Kautz

To the best of our knowledge, we are the first to try and solve the single-view reconstruction problem without a category-specific template mesh or semantic keypoints.

3D Reconstruction Object +1

226

Paper
Code

Content-Consistent Generation of Realistic Eyes with Style

1 code implementation • 8 Nov 2019 • Marcel Bühler, Seonwook Park, Shalini De Mello, Xucong Zhang, Otmar Hilliges

Accurately labeled real-world training data can be scarce, and hence recent works adapt, modify or generate images to boost target datasets.

Semantic Segmentation

Paper
Code

Joint-task Self-supervised Learning for Temporal Correspondence

2 code implementations • NeurIPS 2019 • Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang

Our learning process integrates two highly related tasks: tracking large image regions \emph{and} establishing fine-grained pixel-level associations between consecutive video frames.

Ranked #73 on Semi-Supervised Video Object Segmentation on DAVIS 2017 (val)

Object Tracking Self-Supervised Learning +2

176

Paper
Code

Learning Propagation for Arbitrarily-structured Data

no code implementations • ICCV 2019 • Sifei Liu, Xueting Li, Varun Jampani, Shalini De Mello, Jan Kautz

We experiment with semantic segmentation networks, where we use our propagation module to jointly train on different data -- images, superpixels and point clouds.

Point Cloud Segmentation Segmentation +2

Paper
Add Code

Few-Shot Viewpoint Estimation

no code implementations • 13 May 2019 • Hung-Yu Tseng, Shalini De Mello, Jonathan Tremblay, Sifei Liu, Stan Birchfield, Ming-Hsuan Yang, Jan Kautz

Through extensive experimentation on the ObjectNet3D and Pascal3D+ benchmark datasets, we demonstrate that our framework, which we call MetaView, significantly outperforms fine-tuning the state-of-the-art models with few examples, and that the specific architectural innovations of our method are crucial to achieving good performance.

Meta-Learning Viewpoint Estimation

Paper
Add Code

Few-Shot Adaptive Gaze Estimation

1 code implementation • ICCV 2019 • Seonwook Park, Shalini De Mello, Pavlo Molchanov, Umar Iqbal, Otmar Hilliges, Jan Kautz

Inter-personal anatomical differences limit the accuracy of person-independent gaze estimation networks.

Ranked #1 on Gaze Estimation on MPII Gaze (using extra training data)

Gaze Estimation Meta-Learning

306

Paper
Code

Switchable Temporal Propagation Network

1 code implementation • ECCV 2018 • Sifei Liu, Guangyu Zhong, Shalini De Mello, Jinwei Gu, Varun Jampani, Ming-Hsuan Yang, Jan Kautz

Our approach is based on a temporal propagation network (TPN), which models the transition-related affinity between a pair of frames in a purely data-driven manner.

Video Compression

176

Paper
Code

Light-weight Head Pose Invariant Gaze Tracking

no code implementations • 23 Apr 2018 • Rajeev Ranjan, Shalini De Mello, Jan Kautz

Unconstrained remote gaze tracking using off-the-shelf cameras is a challenging problem.

Gaze Estimation Transfer Learning +1

Paper
Add Code

Learning Affinity via Spatial Propagation Networks

no code implementations • NeurIPS 2017 • Sifei Liu, Shalini De Mello, Jinwei Gu, Guangyu Zhong, Ming-Hsuan Yang, Jan Kautz

Specifically, we develop a three-way connection for the linear propagation model, which (a) formulates a sparse transformation matrix, where all elements can be the output from a deep CNN, but (b) results in a dense affinity matrix that effectively models any task-specific pairwise similarity matrix.

Colorization Face Parsing +4

Paper
Add Code

Learning Affinity via Spatial Propagation Network

no code implementations • 3 Oct 2017 • Sifei Liu, Shalini De Mello, Jinwei Gu, Guangyu Zhong, Ming-Hsuan Yang, Jan Kautz

Colorization Face Parsing +4

Paper
Add Code

Learning to Segment Instances in Videos with Spatial Propagation Network

no code implementations • 14 Sep 2017 • Jingchun Cheng, Sifei Liu, Yi-Hsuan Tsai, Wei-Chih Hung, Shalini De Mello, Jinwei Gu, Jan Kautz, Shengjin Wang, Ming-Hsuan Yang

In addition, we apply a filter on the refined score map that aims to recognize the best connected region using spatial and temporal consistencies in the video.

Object Segmentation +1

Paper
Add Code

Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network

no code implementations • CVPR 2017 • Jinwei Gu, Xiaodong Yang, Shalini De Mello, Jan Kautz

We are inspired by the fact that the computation performed in an RNN bears resemblance to Bayesian filters, which have been used for tracking in many previous methods for facial analysis from videos.

Ranked #1 on Head Pose Estimation on BIWI (MAE (trained with BIWI data) metric, using extra training data)

Face Alignment Feature Engineering +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.