Search Results for author: Jiahui Zhang

Found 32 papers, 16 papers with code

FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization

no code implementations11 Mar 2024 Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, Eric Xing

3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis.

Novel View Synthesis

WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation

1 code implementation5 Dec 2023 Jiachen Lu, Ze Huang, Zeyu Yang, Jiahui Zhang, Li Zhang

Generating multi-camera street-view videos is critical for augmenting autonomous driving datasets, addressing the urgent demand for extensive and varied data.

Autonomous Driving Scene Generation +1

Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance

no code implementations16 Oct 2023 Jesse Zhang, Jiahui Zhang, Karl Pertsch, Ziyi Liu, Xiang Ren, Minsuk Chang, Shao-Hua Sun, Joseph J. Lim

Instead, our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing "skill bootstrapping," where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set.

Language Modelling Large Language Model

Pose-Free Neural Radiance Fields via Implicit Pose Regularization

no code implementations ICCV 2023 Jiahui Zhang, Fangneng Zhan, Yingchen Yu, Kunhao Liu, Rongliang Wu, Xiaoqin Zhang, Ling Shao, Shijian Lu

However, as the pose estimator is trained with only rendered images, the pose estimation is usually biased or inaccurate for real images due to the domain gap between real images and rendered images, leading to poor robustness for the pose estimation of real images and further local minima in joint optimization.

Novel View Synthesis Pose Estimation

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

no code implementations20 Jun 2023 Jesse Zhang, Karl Pertsch, Jiahui Zhang, Joseph J. Lim

Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks.

Weakly Supervised 3D Open-vocabulary Segmentation

1 code implementation NeurIPS 2023 Kunhao Liu, Fangneng Zhan, Jiahui Zhang, Muyu Xu, Yingchen Yu, Abdulmotaleb El Saddik, Christian Theobalt, Eric Xing, Shijian Lu

Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research.

Segmentation

POCE: Pose-Controllable Expression Editing

no code implementations18 Apr 2023 Rongliang Wu, Yingchen Yu, Fangneng Zhan, Jiahui Zhang, Shengcai Liao, Shijian Lu

POCE achieves the more accessible and realistic pose-controllable expression editing by mapping face images into UV space, where facial expressions and head poses can be disentangled and edited separately.

Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations

no code implementations18 Apr 2023 Rongliang Wu, Yingchen Yu, Fangneng Zhan, Jiahui Zhang, Xiaoqin Zhang, Shijian Lu

To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network that can model the variational facial animation distribution conditioned upon the input audio and autoregressively convert the audio signals into a facial animation sequence.

Talking Face Generation

StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields

1 code implementation CVPR 2023 Kunhao Liu, Fangneng Zhan, YiWen Chen, Jiahui Zhang, Yingchen Yu, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing

In addition, it transforms the grid features according to the reference style which directly leads to high-quality zero-shot style transfer.

Style Transfer

Regularized Vector Quantization for Tokenized Image Synthesis

no code implementations CVPR 2023 Jiahui Zhang, Fangneng Zhan, Christian Theobalt, Shijian Lu

The first is a prior distribution regularization which measures the discrepancy between a prior token distribution and the predicted token distribution to avoid codebook collapse and low codebook utilization.

Image Generation Quantization

Latent Multi-Relation Reasoning for GAN-Prior based Image Super-Resolution

no code implementations4 Aug 2022 Jiahui Zhang, Fangneng Zhan, Yingchen Yu, Rongliang Wu, Xiaoqin Zhang, Shijian Lu

In addition, stochastic noises fed to the generator are employed for unconditional detail generation, which tends to produce unfaithful details that compromise the fidelity of the generated SR image.

Attribute Code Generation +3

RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

no code implementations26 Jul 2022 Jiahui Zhang, Shitao Tang, Kejie Qiu, Rui Huang, Chuan Fang, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan

Visual relocalization has been a widely discussed problem in 3D vision: given a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query image is estimated.

Image Retrieval Retrieval +1

Auto-regressive Image Synthesis with Integrated Quantization

no code implementations21 Jul 2022 Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Kaiwen Cui, Changgong Zhang, Shijian Lu

Extensive experiments over multiple conditional image generation tasks show that our method achieves superior diverse image generation performance qualitatively and quantitatively as compared with the state-of-the-art.

Conditional Image Generation Inductive Bias +1

VMRF: View Matching Neural Radiance Fields

no code implementations6 Jul 2022 Jiahui Zhang, Fangneng Zhan, Rongliang Wu, Yingchen Yu, Wenqing Zhang, Bai Song, Xiaoqin Zhang, Shijian Lu

With the feature transport plan as the guidance, a novel pose calibration technique is designed which rectifies the initially randomized camera poses by predicting relative pose transformations between the pair of rendered and real images.

Novel View Synthesis

Towards Counterfactual Image Manipulation via CLIP

1 code implementation6 Jul 2022 Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jiahui Zhang, Shijian Lu, Miaomiao Cui, Xuansong Xie, Xian-Sheng Hua, Chunyan Miao

In addition, we design a simple yet effective scheme that explicitly maps CLIP embeddings (of target text) to the latent space and fuses them with latent codes for effective latent code optimization and accurate editing.

counterfactual Image Manipulation

Marginal Contrastive Correspondence for Guided Image Generation

no code implementations CVPR 2022 Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Changgong Zhang

We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation.

Contrastive Learning Image Generation +2

Modulated Contrast for Versatile Image Synthesis

1 code implementation CVPR 2022 Fangneng Zhan, Jiahui Zhang, Yingchen Yu, Rongliang Wu, Shijian Lu

Perceiving the similarity between images has been a long-standing and fundamental problem underlying various visual generation tasks.

Contrastive Learning Image Generation

QuadTree Attention for Vision Transformers

1 code implementation ICLR 2022 Shitao Tang, Jiahui Zhang, Siyu Zhu, Ping Tan

Transformers have been successful in many vision tasks, thanks to their capability of capturing long-range dependency.

object-detection Object Detection +2

Multimodal Image Synthesis and Editing: The Generative AI Era

2 code implementations27 Dec 2021 Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Lingjie Liu, Adam Kortylewski, Christian Theobalt, Eric Xing

With superb power in modeling the interaction among multimodal information, multimodal image synthesis and editing has become a hot research topic in recent years.

Image Generation

Awakening Latent Grounding from Pretrained Language Models for Semantic Parsing

1 code implementation Findings (ACL) 2021 Qian Liu, Dejian Yang, Jiahui Zhang, Jiaqi Guo, Bin Zhou, Jian-Guang Lou

Recent years pretrained language models (PLMs) hit a success on several downstream tasks, showing their power on modeling language.

Semantic Parsing Text-To-SQL

Learning to Match Features with Seeded Graph Matching Network

1 code implementation ICCV 2021 Hongkai Chen, Zixin Luo, Jiahui Zhang, Lei Zhou, Xuyang Bai, Zeyu Hu, Chiew-Lan Tai, Long Quan

2) Seeded Graph Neural Network, which utilizes seed matches to pass messages within/across images and predicts assignment costs.

Graph Matching

Bi-level Feature Alignment for Versatile Image Translation and Manipulation

2 code implementations7 Jul 2021 Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Kaiwen Cui, Aoran Xiao, Shijian Lu, Chunyan Miao

This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance in image generation by explicitly building a correspondence.

Image Generation Translation

Blind Image Super-Resolution via Contrastive Representation Learning

no code implementations1 Jul 2021 Jiahui Zhang, Shijian Lu, Fangneng Zhan, Yingchen Yu

Extensive experiments on synthetic datasets and real images show that the proposed CRL-SR can handle multi-modal and spatially variant degradation effectively under blind settings and it also outperforms state-of-the-art SR methods qualitatively and quantitatively.

Contrastive Learning Image Super-Resolution +1

KFNet: Learning Temporal Camera Relocalization using Kalman Filtering

1 code implementation CVPR 2020 Lei Zhou, Zixin Luo, Tianwei Shen, Jiahui Zhang, Mingmin Zhen, Yao Yao, Tian Fang, Long Quan

Temporal camera relocalization estimates the pose with respect to each video frame in sequence, as opposed to one-shot relocalization which focuses on a still image.

Camera Relocalization

Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency

1 code implementation19 Sep 2019 Tianwei Shen, Lei Zhou, Zixin Luo, Yao Yao, Shiwei Li, Jiahui Zhang, Tian Fang, Long Quan

The self-supervised learning of depth and pose from monocular sequences provides an attractive solution by using the photometric consistency of nearby frames as it depends much less on the ground-truth data.

Pose Estimation Self-Supervised Learning

Learning Two-View Correspondences and Geometry Using Order-Aware Network

1 code implementation ICCV 2019 Jiahui Zhang, Dawei Sun, Zixin Luo, Anbang Yao, Lei Zhou, Tianwei Shen, Yurong Chen, Long Quan, Hongen Liao

First, to capture the local context of sparse correspondences, the network clusters unordered input correspondences by learning a soft assignment matrix.

Vocal Bursts Valence Prediction

ContextDesc: Local Descriptor Augmentation with Cross-Modality Context

1 code implementation CVPR 2019 Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, Long Quan

Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations.

Geometric Matching

Reconstruction and Registration of Large-Scale Medical Scene Using Point Clouds Data from Different Modalities

no code implementations5 Sep 2018 Ke Wang, Han Song, Jiahui Zhang, Xinran Zhang, Hongen Liao

In this paper, we proposed a method which can fuse different modalities 3D data to get a large-scale and dense point cloud.

Cannot find the paper you are looking for? You can Submit a new open access paper.