Search Results for author: Sifei Liu

Found 55 papers, 18 papers with code

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

no code implementations • 18 Mar 2024 • Mengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang

In this paper, we propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data.

6D Pose Estimation using RGB Image Generation +1

Paper
Add Code

RegionGPT: Towards Region Understanding Vision Language Model

no code implementations • 4 Mar 2024 • Qiushan Guo, Shalini De Mello, Hongxu Yin, Wonmin Byeon, Ka Chun Cheung, Yizhou Yu, Ping Luo, Sifei Liu

Vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions.

Language Modelling

Paper
Add Code

RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos

no code implementations • 23 Jan 2024 • Hongchi Xia, Yang Fu, Sifei Liu, Xiaolong Wang

WildRGB-D comprises large-scale category-level RGB-D object videos, which are taken using an iPhone to go around the objects in 360 degrees.

6D Pose Estimation Novel View Synthesis +2

Paper
Add Code

AGG: Amortized Generative 3D Gaussians for Single Image to 3D

no code implementations • 8 Jan 2024 • Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat

To overcome these challenges, we introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image, eliminating the need for per-instance optimization.

3D Generation 3D Reconstruction +2

Paper
Add Code

COLMAP-Free 3D Gaussian Splatting

no code implementations • 12 Dec 2023 • Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, Xiaolong Wang

While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses.

Neural Rendering Novel View Synthesis +1

Paper
Add Code

A Unified Approach for Text- and Image-guided 4D Scene Generation

no code implementations • 28 Nov 2023 • Yufeng Zheng, Xueting Li, Koki Nagano, Sifei Liu, Karsten Kreis, Otmar Hilliges, Shalini De Mello

Large-scale diffusion generative models are greatly simplifying image, video and 3D asset creation from user-provided text prompts and images.

Scene Generation

Paper
Add Code

3D Reconstruction with Generalizable Neural Fields using Scene Priors

no code implementations • 26 Sep 2023 • Yang Fu, Shalini De Mello, Xueting Li, Amey Kulkarni, Jan Kautz, Xiaolong Wang, Sifei Liu

NFP not only demonstrates SOTA scene reconstruction performance and efficiency, but it also supports single-image novel-view synthesis, which is underexplored in neural fields.

3D Reconstruction 3D Scene Reconstruction +1

Paper
Add Code

Generalizable One-shot Neural Head Avatar

no code implementations • 14 Jun 2023 • Xueting Li, Shalini De Mello, Sifei Liu, Koki Nagano, Umar Iqbal, Jan Kautz

We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image.

Super-Resolution

Paper
Add Code

Zero-shot Pose Transfer for Unrigged Stylized 3D Characters

1 code implementation • CVPR 2023 • Jiashun Wang, Xueting Li, Sifei Liu, Shalini De Mello, Orazio Gallo, Xiaolong Wang, Jan Kautz

We present a zero-shot approach that requires only the widely available deformed non-stylized avatars in training, and deforms stylized characters of significantly different shapes at inference.

Pose Transfer

Paper
Code

TUVF: Learning Generalizable Texture UV Radiance Fields

no code implementations • 4 May 2023 • An-Chieh Cheng, Xueting Li, Sifei Liu, Xiaolong Wang

This allows the texture to be disentangled from the underlying shape and transferable to other shapes that share the same UV space, i. e., from the same category.

3D Shape Modeling Texture Synthesis

Paper
Add Code

Affordance Diffusion: Synthesizing Hand-Object Interactions

no code implementations • CVPR 2023 • Yufei Ye, Xueting Li, Abhinav Gupta, Shalini De Mello, Stan Birchfield, Jiaming Song, Shubham Tulsiani, Sifei Liu

In contrast, in this work we focus on synthesizing complex interactions (ie, an articulated hand) with a given object.

Descriptive Image Generation +1

Paper
Add Code

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

1 code implementation • CVPR 2023 • Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello

Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks.

Ranked #2 on Open-World Instance Segmentation on UVO (using extra training data)

Open Vocabulary Panoptic Segmentation Open Vocabulary Semantic Segmentation +4

798

Paper
Code

Self-Supervised Super-Plane for Neural 3D Reconstruction

1 code implementation • CVPR 2023 • Botao Ye, Sifei Liu, Xueting Li, Ming-Hsuan Yang

In this work, we introduce a self-supervised super-plane constraint by exploring the free geometry cues from the predicted surface, which can further regularize the reconstruction of plane regions without any other ground truth annotations.

3D Reconstruction

Paper
Code

Physics-based Indirect Illumination for Inverse Rendering

no code implementations • 9 Dec 2022 • Youming Deng, Xueting Li, Sifei Liu, Ming-Hsuan Yang

We present a physics-based inverse rendering method that learns the illumination, geometry, and materials of a scene from posed multi-view RGB images.

Efficient Neural Network Inverse Rendering +1

Paper
Add Code

Autoregressive 3D Shape Generation via Canonical Mapping

1 code implementation • 5 Apr 2022 • An-Chieh Cheng, Xueting Li, Sifei Liu, Min Sun, Ming-Hsuan Yang

With the capacity of modeling long-range dependencies in sequential data, transformers have shown remarkable performances in a variety of generative tasks such as image, audio, and text generation.

3D Shape Generation Point Cloud Generation +1

Paper
Code

CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

1 code implementation • CVPR 2022 • Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu

We represent the correspondence maps of different images as warped coordinate frames transformed from a canonical coordinate frame, i. e., the correspondence map, which describes the structure (e. g., the shape of a face), is controlled via a transformation.

Disentanglement

Paper
Code

GroupViT: Semantic Segmentation Emerges from Text Supervision

2 code implementations • CVPR 2022 • Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang

With only text supervision and without any pixel-level annotations, GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner, i. e., without any further fine-tuning.

Ranked #3 on Unsupervised Semantic Segmentation with Language-image Pre-training on PascalVOC-20

Object Detection Scene Understanding +3

124,593

Paper
Code

Coupled Segmentation and Edge Learning via Dynamic Graph Propagation

no code implementations • NeurIPS 2021 • Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz

It is therefore interesting to study how these two tasks can be coupled to benefit each other.

Edge Detection Image Segmentation +2

Paper
Add Code

Learning Continuous Environment Fields via Implicit Functions

no code implementations • ICLR 2022 • Xueting Li, Shalini De Mello, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz, Sifei Liu

We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory.

Position Trajectory Prediction

Paper
Add Code

Self-Supervised Object Detection via Generative Image Synthesis

no code implementations • ICCV 2021 • Siva Karthik Mustikovela, Shalini De Mello, Aayush Prakash, Umar Iqbal, Sifei Liu, Thu Nguyen-Phuoc, Carsten Rother, Jan Kautz

We present SSOD, the first end-to-end analysis-by synthesis framework with controllable GANs for the task of self-supervised object detection.

Image Generation Object +2

Paper
Add Code

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

no code implementations • ICCV 2021 • Zihang Lai, Sifei Liu, Alexei A. Efros, Xiaolong Wang

Relying on temporal continuity in videos, our work assumes that the 3D scene structure in nearby video frames remains static.

Disentanglement Novel View Synthesis +2

Paper
Add Code

Learning Contrastive Representation for Semantic Correspondence

no code implementations • 22 Sep 2021 • Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz, Ming-Hsuan Yang

Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: 1) large variations in appearance, scale and pose exist even for objects from the same category, and 2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale.

Contrastive Learning Semantic correspondence

Paper
Add Code

Learning 3D Dense Correspondence via Canonical Point Autoencoder

no code implementations • NeurIPS 2021 • An-Chieh Cheng, Xueting Li, Min Sun, Ming-Hsuan Yang, Sifei Liu

We propose a canonical point autoencoder (CPAE) that predicts dense correspondences between 3D shapes of the same category.

Segmentation

Paper
Add Code

Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

no code implementations • CVPR 2021 • Shaowei Liu, Hanwen Jiang, Jiarui Xu, Sifei Liu, Xiaolong Wang

Estimating 3D hand and object pose from a single image is an extremely challenging problem: hands and objects are often self-occluded during interactions, and the 3D annotations are scarce as even humans cannot directly label the ground-truths from a single image perfectly.

Ranked #7 on hand-object pose on HO-3D

hand-object pose Object

Paper
Add Code

Contrastive Syn-to-Real Generalization

2 code implementations • ICLR 2021 • Wuyang Chen, Zhiding Yu, Shalini De Mello, Sifei Liu, Jose M. Alvarez, Zhangyang Wang, Anima Anandkumar

Training on synthetic data can be beneficial for label or data-scarce scenarios.

Domain Generalization Inductive Bias

Paper
Code

Learning to Track Instances without Video Annotations

no code implementations • CVPR 2021 • Yang Fu, Sifei Liu, Umar Iqbal, Shalini De Mello, Humphrey Shi, Jan Kautz

Tracking segmentation masks of multiple instances has been intensively studied, but still faces two fundamental challenges: 1) the requirement of large-scale, frame-wise annotation, and 2) the complexity of two-stage approaches.

Instance Segmentation Pose Estimation +1

Paper
Add Code

Video Matting via Consistency-Regularized Graph Neural Networks

no code implementations • ICCV 2021 • Tiantian Wang, Sifei Liu, Yapeng Tian, Kai Li, Ming-Hsuan Yang

In this paper, we propose to enhance the temporal coherence by Consistency-Regularized Graph Neural Networks (CRGNN) with the aid of a synthesized video matting dataset.

Image Matting Optical Flow Estimation +1

Paper
Add Code

Learning Continuous Image Representation with Local Implicit Image Function

2 code implementations • CVPR 2021 • Yinbo Chen, Sifei Liu, Xiaolong Wang

How to represent an image?

Ranked #2 on Image Super-Resolution on DIV2K val - 4x upscaling (SSIM metric)

3D Reconstruction Image Super-Resolution

1,208

Paper
Code

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

1 code implementation • CVPR 2021 • Jiashun Wang, Huazhe Xu, Jingwei Xu, Sifei Liu, Xiaolong Wang

Synthesizing 3D human motion plays an important role in many graphics applications as well as understanding human activity.

Motion Synthesis

Paper
Code

Online Adaptation for Consistent Mesh Reconstruction in the Wild

no code implementations • NeurIPS 2020 • Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz

This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild.

3D Reconstruction

Paper
Add Code

Hierarchical Contrastive Motion Learning for Video Action Recognition

no code implementations • 20 Jul 2020 • Xitong Yang, Xiaodong Yang, Sifei Liu, Deqing Sun, Larry Davis, Jan Kautz

Thus, the motion features at higher levels are trained to gradually capture semantic dynamics and evolve more discriminative for action recognition.

Action Recognition Contrastive Learning +2

Paper
Add Code

Regularizing Meta-Learning via Gradient Dropout

1 code implementation • 13 Apr 2020 • Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai, Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang

With the growing attention on learning-to-learn new tasks using only a few examples, meta-learning has been widely used in numerous problems such as few-shot classification, reinforcement learning, and domain generalization.

Domain Generalization Meta-Learning

Paper
Code

Self-Supervised Viewpoint Learning From Image Collections

2 code implementations • CVPR 2020 • Siva Karthik Mustikovela, Varun Jampani, Shalini De Mello, Sifei Liu, Umar Iqbal, Carsten Rother, Jan Kautz

Training deep neural networks to estimate the viewpoint of objects requires large labeled training datasets.

Object Viewpoint Estimation

214

Paper
Code

Self-supervised Single-view 3D Reconstruction via Semantic Consistency

1 code implementation • ECCV 2020 • Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, Jan Kautz

To the best of our knowledge, we are the first to try and solve the single-view reconstruction problem without a category-specific template mesh or semantic keypoints.

3D Reconstruction Object +1

226

Paper
Code

Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning

no code implementations • 19 Feb 2020 • Xiang Wang, Sifei Liu, Huimin Ma, Ming-Hsuan Yang

In this paper, we propose an iterative algorithm to learn such pairwise relations, which consists of two branches, a unary segmentation network which learns the label probabilities for each pixel, and a pairwise affinity network which learns affinity matrix and refines the probability map generated from the unary network.

Segmentation Weakly supervised Semantic Segmentation +1

Paper
Add Code

Joint-task Self-supervised Learning for Temporal Correspondence

2 code implementations • NeurIPS 2019 • Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang

Our learning process integrates two highly related tasks: tracking large image regions \emph{and} establishing fine-grained pixel-level associations between consecutive video frames.

Ranked #73 on Semi-Supervised Video Object Segmentation on DAVIS 2017 (val)

Object Tracking Self-Supervised Learning +2

176

Paper
Code

Learning Propagation for Arbitrarily-structured Data

no code implementations • ICCV 2019 • Sifei Liu, Xueting Li, Varun Jampani, Shalini De Mello, Jan Kautz

We experiment with semantic segmentation networks, where we use our propagation module to jointly train on different data -- images, superpixels and point clouds.

Point Cloud Segmentation Segmentation +2

Paper
Add Code

Few-Shot Viewpoint Estimation

no code implementations • 13 May 2019 • Hung-Yu Tseng, Shalini De Mello, Jonathan Tremblay, Sifei Liu, Stan Birchfield, Ming-Hsuan Yang, Jan Kautz

Through extensive experimentation on the ObjectNet3D and Pascal3D+ benchmark datasets, we demonstrate that our framework, which we call MetaView, significantly outperforms fine-tuning the state-of-the-art models with few examples, and that the specific architectural innovations of our method are crucial to achieving good performance.

Meta-Learning Viewpoint Estimation

Paper
Add Code

SCOPS: Self-Supervised Co-Part Segmentation

1 code implementation • CVPR 2019 • Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, Jan Kautz

Parts provide a good intermediate representation of objects that is robust with respect to the camera, pose and appearance variations.

Ranked #4 on Unsupervised Keypoint Estimation on CUB

Object Segmentation +4

218

Paper
Code

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments

no code implementations • CVPR 2019 • Xueting Li, Sifei Liu, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz

In order to predict valid affordances and learn possible 3D human poses in indoor scenes, we need to understand the semantic and geometric structure of a scene as well as its potential interactions with a human.

valid

Paper
Add Code

Context-Aware Synthesis and Placement of Object Instances

2 code implementations • NeurIPS 2018 • Donghoon Lee, Sifei Liu, Jinwei Gu, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz

Learning to insert an object instance into an image in a semantically coherent manner is a challenging and interesting problem.

Object Scene Parsing

Paper
Code

Rendering Portraitures from Monocular Camera and Beyond

no code implementations • ECCV 2018 • Xiangyu Xu, Deqing Sun, Sifei Liu, Wenqi Ren, Yu-Jin Zhang, Ming-Hsuan Yang, Jian Sun

Specifically, we first exploit Convolutional Neural Networks to estimate the relative depth and portrait segmentation maps from a single input image.

Image Matting Portrait Segmentation +1

Paper
Add Code

Learning Linear Transformations for Fast Arbitrary Style Transfer

1 code implementation • 14 Aug 2018 • Xueting Li, Sifei Liu, Jan Kautz, Ming-Hsuan Yang

Recent arbitrary style transfer methods transfer second order statistics from reference image onto content image via a multiplication between content image features and a transformation matrix, which is computed from features with a pre-determined algorithm.

Domain Adaptation Style Transfer

374

Paper
Code

Learning Dual Convolutional Neural Networks for Low-Level Vision

no code implementations • CVPR 2018 • Jinshan Pan, Sifei Liu, Deqing Sun, Jiawei Zhang, Yang Liu, Jimmy Ren, Zechao Li, Jinhui Tang, Huchuan Lu, Yu-Wing Tai, Ming-Hsuan Yang

These problems usually involve the estimation of two components of the target signals: structures and details.

Rain Removal Super-Resolution

Paper
Add Code

Switchable Temporal Propagation Network

1 code implementation • ECCV 2018 • Sifei Liu, Guangyu Zhong, Shalini De Mello, Jinwei Gu, Varun Jampani, Ming-Hsuan Yang, Jan Kautz

Our approach is based on a temporal propagation network (TPN), which models the transition-related affinity between a pair of frames in a purely data-driven manner.

Video Compression

176

Paper
Code

Learning Video-Story Composition via Recurrent Neural Network

no code implementations • 31 Jan 2018 • Guangyu Zhong, Yi-Hsuan Tsai, Sifei Liu, Zhixun Su, Ming-Hsuan Yang

In this paper, we propose a learning-based method to compose a video-story from a group of video clips that describe an activity or experience.

Paper
Add Code

Learning Affinity via Spatial Propagation Network

no code implementations • 3 Oct 2017 • Sifei Liu, Shalini De Mello, Jinwei Gu, Guangyu Zhong, Ming-Hsuan Yang, Jan Kautz

Specifically, we develop a three-way connection for the linear propagation model, which (a) formulates a sparse transformation matrix, where all elements can be the output from a deep CNN, but (b) results in a dense affinity matrix that effectively models any task-specific pairwise similarity matrix.

Colorization Face Parsing +4

Paper
Add Code

Learning Affinity via Spatial Propagation Networks

no code implementations • NeurIPS 2017 • Sifei Liu, Shalini De Mello, Jinwei Gu, Guangyu Zhong, Ming-Hsuan Yang, Jan Kautz

Colorization Face Parsing +4

Paper
Add Code

Learning to Segment Instances in Videos with Spatial Propagation Network

no code implementations • 14 Sep 2017 • Jingchun Cheng, Sifei Liu, Yi-Hsuan Tsai, Wei-Chih Hung, Shalini De Mello, Jinwei Gu, Jan Kautz, Shengjin Wang, Ming-Hsuan Yang

In addition, we apply a filter on the refined score map that aims to recognize the best connected region using spatial and temporal consistencies in the video.

Object Segmentation +1

Paper
Add Code

Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

no code implementations • ICCV 2017 • Kihyuk Sohn, Sifei Liu, Guangyu Zhong, Xiang Yu, Ming-Hsuan Yang, Manmohan Chandraker

Despite rapid advances in face recognition, there remains a clear gap between the performance of still image-based face recognition and video-based face recognition, due to the vast difference in visual quality between the domains and the difficulty of curating diverse large-scale video datasets.

Data Augmentation Face Recognition +1

Paper
Add Code

Face Parsing via Recurrent Propagation

no code implementations • 6 Aug 2017 • Sifei Liu, Jianping Shi, Ji Liang, Ming-Hsuan Yang

Face parsing is an important problem in computer vision that finds numerous applications including recognition and editing.

Face Parsing

Paper
Add Code

Generative Face Completion

2 code implementations • CVPR 2017 • Yijun Li, Sifei Liu, Jimei Yang, Ming-Hsuan Yang

In this paper, we propose an effective face completion algorithm using a deep generative model.

Facial Inpainting Semantic Parsing

316

Paper
Code

Deep Cascaded Bi-Network for Face Hallucination

no code implementations • 18 Jul 2016 • Shizhan Zhu, Sifei Liu, Chen Change Loy, Xiaoou Tang

We present a novel framework for hallucinating faces of unconstrained poses and with very low resolution (face size as small as 5pxIOD).

Ranked #5 on Image Super-Resolution on VggFace2 - 8x upscaling

Face Hallucination Hallucination

Paper
Add Code

Multi-Objective Convolutional Learning for Face Labeling

no code implementations • CVPR 2015 • Sifei Liu, Jimei Yang, Chang Huang, Ming-Hsuan Yang

This paper formulates face labeling as a conditional random field with unary and pairwise classifiers.

Paper
Add Code

Structured Face Hallucination

no code implementations • CVPR 2013 • Chih-Yuan Yang, Sifei Liu, Ming-Hsuan Yang

Each face image is represented in terms of facial components, contours and smooth regions.

Face Hallucination Hallucination +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.