1 code implementation • LREC 2022 • Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Yiu, Rita Frieske, Holy Lovenia, Genta Winata, Qifeng Chen, Xiaojuan Ma, Bertram Shi, Pascale Fung
With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.
2 code implementations • 11 Apr 2024 • Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip Torr, Fabio Pizzati
Hence, we propose Latent Guard, a framework designed to improve safety measures in text-to-image generation.
no code implementations • 8 Apr 2024 • Xiaoyan Cong, Yue Wu, Qifeng Chen, Chenyang Lei
Unlike most previous end-to-end automatic colorization algorithms, our framework allows for iterative and localized modifications of the colorization results because we explicitly model the coloring samples.
no code implementations • 5 Apr 2024 • Kei Ikemura, Yiming Huang, Felix Heide, Zhaoxiang Zhang, Qifeng Chen, Chenyang Lei
Existing depth sensors are imperfect and may provide inaccurate depth values in challenging scenarios, such as in the presence of transparent or reflective objects.
no code implementations • 19 Mar 2024 • Junhao Cai, Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qifeng Chen
Derived from OmniObject3D, OO3D-9D is the largest and most diverse dataset in the field of category-level object pose and size estimation.
2 code implementations • 13 Mar 2024 • Yue Ma, Yingqing He, Hongfa Wang, Andong Wang, Chenyang Qi, Chengfei Cai, Xiu Li, Zhifeng Li, Heung-Yeung Shum, Wei Liu, Qifeng Chen
Despite recent advances in image-to-video generation, better controllability and local animation are less explored.
no code implementations • 10 Mar 2024 • Zhili Chen, Kien T. Pham, Maosheng Ye, Zhiqiang Shen, Qifeng Chen
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving.
no code implementations • 27 Feb 2024 • Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen
Thus, instead of training the giant models from scratch, we propose to bridge the existing strong models with a shared latent representation space.
no code implementations • 21 Feb 2024 • Qingyan Bai, Zifan Shi, Yinghao Xu, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chen
This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner.
1 code implementation • 16 Feb 2024 • Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, YuFei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen
Diffusion models have proven to be highly effective in image and video generation; however, they still face composition challenges when generating images of varying sizes due to single-scale training data.
no code implementations • 16 Feb 2024 • Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, JianGuo Zhang
Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks.
no code implementations • 13 Jan 2024 • Yuen-Fui Lau, Tianjia Zhang, Zhefan Rao, Qifeng Chen
The latent code extracted from the degraded input image often contains corrupted features, making it difficult to align the semantic information from the input with the high-quality textures from the reference.
no code implementations • 9 Jan 2024 • Junming Chen, Yunfei Liu, Jianan Wang, Ailing Zeng, Yu Li, Qifeng Chen
We propose DiffSHEG, a Diffusion-based approach for Speech-driven Holistic 3D Expression and Gesture generation with arbitrary length.
no code implementations • 18 Dec 2023 • Bingyuan Wang, Hengyu Meng, Zeyu Cai, Lanjiong Li, Yue Ma, Qifeng Chen, Zeyu Wang
Visual storytelling often uses nontypical aspect-ratio images like scroll paintings, comic strips, and panoramas to create an expressive and compelling narrative.
no code implementations • 18 Dec 2023 • Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi
Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement.
no code implementations • 12 Dec 2023 • Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen
The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction.
1 code implementation • 11 Dec 2023 • Ka Leong Cheng, Qiuyu Wang, Zifan Shi, Kecheng Zheng, Yinghao Xu, Hao Ouyang, Qifeng Chen, Yujun Shen
Neural radiance fields, which represent a 3D scene as a color field and a density field, have demonstrated great progress in novel view synthesis yet are unfavorable for editing due to the implicitness.
1 code implementation • 5 Dec 2023 • Yue Ma, Xiaodong Cun, Yingqing He, Chenyang Qi, Xintao Wang, Ying Shan, Xiu Li, Qifeng Chen
Yet succinct, our method is the first method to show the ability of video property editing from the pre-trained text-to-image model.
no code implementations • 2 Dec 2023 • Qiang Wen, Yazhou Xing, Zhefan Rao, Qifeng Chen
Specifically, to tailor the pre-trained latent diffusion model to operate on the RAW domain, we train a set of lightweight taming modules to inject the RAW information into the diffusion denoising process via modulating the intermediate features of UNet.
no code implementations • 29 Nov 2023 • Rameen Abdal, Wang Yifan, Zifan Shi, Yinghao Xu, Ryan Po, Zhengfei Kuang, Qifeng Chen, Dit-yan Yeung, Gordon Wetzstein
Instead of rasterizing the shells directly, we sample 3D Gaussians on the shells whose attributes are encoded in the texture features.
no code implementations • 28 Nov 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei
The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text.
no code implementations • 14 Nov 2023 • Zhili Chen, Maosheng Ye, Shuangjie Xu, Tongyi Cao, Qifeng Chen
Unlike existing end-to-end autonomous driving frameworks, PPAD models the interactions among ego, agents, and the dynamic environment in an autoregressive manner by interleaving the Prediction and Planning processes at every timestep, instead of a single sequential process of prediction followed by planning.
3 code implementations • 30 Oct 2023 • Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, Ying Shan
The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.
Ranked #3 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)
1 code implementation • 26 Oct 2023 • Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks.
1 code implementation • 11 Oct 2023 • Yingqing He, Shaoshu Yang, Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, Ying Shan
Our work also suggests that a pre-trained diffusion model trained on low-resolution images can be directly used for high-resolution visual generation without further tuning, which may provide insights for future research on ultra-high-resolution image and video synthesis.
no code implementations • 25 Sep 2023 • Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen, Bolei Zhou
This work fills in this gap by proposing in-domain GAN inversion, which consists of a domain-guided encoder and a domain-regularized optimizer, to regularize the inverted code in the native latent space of the pre-trained GAN model.
no code implementations • 5 Sep 2023 • Yue Wu, Sicheng Xu, Jianfeng Xiang, Fangyun Wei, Qifeng Chen, Jiaolong Yang, Xin Tong
For the new task, we base our method on the generative radiance manifold representation and equip it with learnable facial and head-shoulder deformations.
no code implementations • 29 Aug 2023 • Yazhou Xing, Amrita Mazumdar, Anjul Patney, Chao Liu, Hongxu Yin, Qifeng Chen, Jan Kautz, Iuri Frosio
We present a learning-based system to reduce these artifacts without resorting to complex acquisition mechanisms like alternating exposures or costly processing that are typical of high dynamic range (HDR) imaging.
1 code implementation • 15 Aug 2023 • Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen
We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i. e., rendered from the canonical content field) to each individual frame along the time axis. Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline. We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e. g., the object shape) from the video. With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field. We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training. More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog. Project page can be found at https://qiuyu96. github. io/CoDeF/.
1 code implementation • 13 Jul 2023 • Yingqing He, Menghan Xia, Haoxin Chen, Xiaodong Cun, Yuan Gong, Jinbo Xing, Yong Zhang, Xintao Wang, Chao Weng, Ying Shan, Qifeng Chen
For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure.
1 code implementation • 9 Jul 2023 • Jun Cen, Shiwei Zhang, Yixuan Pei, Kun Li, Hang Zheng, Maochun Luo, Yingya Zhang, Qifeng Chen
In this way, RGB images are not required during inference anymore since the 2D knowledge branch provides 2D information according to the 3D LIDAR input.
1 code implementation • 23 May 2023 • Jun Cen, Yizheng Wu, Kewei Wang, Xingyi Li, Jingkang Yang, Yixuan Pei, Lingdong Kong, Ziwei Liu, Qifeng Chen
The Segment Anything Model (SAM) has demonstrated its effectiveness in segmenting any part of 2D RGB images.
Open Vocabulary Semantic Segmentation Panoptic Segmentation +1
no code implementations • NeurIPS 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei
Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text.
1 code implementation • 3 Apr 2023 • Yue Ma, Yingqing He, Xiaodong Cun, Xintao Wang, Siran Chen, Ying Shan, Xiu Li, Qifeng Chen
Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human.
1 code implementation • CVPR 2023 • Chenyang Qi, Xin Yang, Ka Leong Cheng, Ying-Cong Chen, Qifeng Chen
Then, an efficient frequency-aware decoder reconstructs a high-fidelity HR image from the LR one in real time.
1 code implementation • CVPR 2023 • Jun Cen, Shiwei Zhang, Xiang Wang, Yixuan Pei, Zhiwu Qing, Yingya Zhang, Qifeng Chen
In this paper, we begin with analyzing the feature representation behavior in the open-set action recognition (OSAR) problem based on the information bottleneck (IB) theory, and propose to enlarge the instance-specific (IS) and class-specific (CS) information contained in the feature for better performance.
no code implementations • 20 Mar 2023 • Zhao-Heng Yin, Binghao Huang, Yuzhe Qin, Qifeng Chen, Xiaolong Wang
Relying on touch-only sensing, we can directly deploy the policy in a real robot hand and rotate novel objects that are not presented in training.
1 code implementation • ICCV 2023 • Chenyang Qi, Xiaodong Cun, Yong Zhang, Chenyang Lei, Xintao Wang, Ying Shan, Qifeng Chen
We also have a better zero-shot shape-aware editing ability based on the text-to-video model.
1 code implementation • CVPR 2023 • Chenyang Lei, Xuanchi Ren, Zhaoxiang Zhang, Qifeng Chen
Prior work usually requires specific guidance such as the flickering frequency, manual annotations, or extra consistent videos to remove the flicker.
1 code implementation • 22 Feb 2023 • Hongyu Liu, Xintong Han, ChengBin Jin, Lihui Qian, Huawei Wei, Zhe Lin, Faqiang Wang, Haoye Dong, Yibing Song, Jia Xu, Qifeng Chen
In this paper, we propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching, respectively.
1 code implementation • 12 Feb 2023 • Qiang Wen, Yue Wu, Qifeng Chen
The waterdrops on windshields during driving can cause severe visual obstructions, which may lead to car accidents.
1 code implementation • 8 Feb 2023 • Jun Cen, Di Luan, Shiwei Zhang, Yixuan Pei, Yingya Zhang, Deli Zhao, Shaojie Shen, Qifeng Chen
Recently, Unified Open-set Recognition (UOSR) has been proposed to reject not only unknown samples but also known but wrongly classified samples, which tends to be more practical in real-world applications.
no code implementations • CVPR 2023 • Zifan Shi, Yujun Shen, Yinghao Xu, Sida Peng, Yiyi Liao, Sheng Guo, Qifeng Chen, Dit-yan Yeung
Existing methods for 3D-aware image synthesis largely depend on the 3D pose distribution pre-estimated on the training set.
no code implementations • ICCV 2023 • Jiapeng Zhu, Ceyuan Yang, Yujun Shen, Zifan Shi, Bo Dai, Deli Zhao, Qifeng Chen
This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to a set of pixels in the synthesized image.
1 code implementation • ICCV 2023 • Huimin Wu, Chenyang Lei, Xiao Sun, Peng-Shuai Wang, Qifeng Chen, Kwang-Ting Cheng, Stephen Lin, Zhirong Wu
Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part.
1 code implementation • CVPR 2023 • BoWen Zhang, Chenyang Qi, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang, Fang Wen
In this work, we propose an ID-preserving talking head generation framework, which advances previous methods in two aspects.
no code implementations • CVPR 2023 • Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining Guo
This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields.
1 code implementation • CVPR 2023 • Jiaxin Xie, Hao Ouyang, Jingtan Piao, Chenyang Lei, Qifeng Chen
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views while preserving specific details of the input image.
1 code implementation • 23 Nov 2022 • Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, Qifeng Chen
Diffusion models have shown remarkable results recently but require significant computational resources.
Ranked #2 on Video Generation on Taichi
1 code implementation • CVPR 2023 • Hongyu Liu, Yibing Song, Qifeng Chen
In this work, we propose to first obtain the precise latent code in foundation latent space $\mathcal{W}$.
2 code implementations • CVPR 2023 • Renjie Pi, Weizhong Zhang, Yueqi Xie, Jiahui Gao, Xiaoyu Wang, Sunghun Kim, Qifeng Chen
Specifically, we first reserve a short trajectory of global model snapshots on the server.
no code implementations • 10 Nov 2022 • Yueqi Xie, Weizhong Zhang, Renjie Pi, Fangzhao Wu, Qifeng Chen, Xing Xie, Sunghun Kim
Since at each round, the number of tunable parameters optimized on the server side equals the number of participating clients (thus independent of the model size), we are able to train a global model with massive parameters using only a small amount of proxy data (e. g., around one hundred samples).
1 code implementation • 5 Nov 2022 • Chenyang Lei, Xudong Jiang, Qifeng Chen
We propose a simple yet effective reflection-free cue for robust reflection removal from a pair of flash and ambient (no-flash) images.
1 code implementation • 18 Oct 2022 • Zhao-Heng Yin, Weirui Ye, Qifeng Chen, Yang Gao
Inspired by the recent success of EfficientZero in RL, we propose EfficientImitate (EI), a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
1 code implementation • 14 Oct 2022 • Yiming Zhu, Hongyu Liu, Yibing Song, Ziyang Yuan, Xintong Han, Chun Yuan, Qifeng Chen, Jue Wang
Based on the visual latent space of StyleGAN[21] and text embedding space of CLIP[34], studies focus on how to map these two latent spaces for text-driven attribute manipulations.
1 code implementation • 12 Oct 2022 • Yue Wu, Yu Deng, Jiaolong Yang, Fangyun Wei, Qifeng Chen, Xin Tong
To achieve meaningful control over facial expressions via deformation, we propose a 3D-level imitative learning scheme between the generator and a parametric 3D face model during adversarial training of the 3D-aware GAN.
1 code implementation • 3 Oct 2022 • Junming Chen, Meirui Jiang, Qi Dou, Qifeng Chen
Our style representation is exceptionally lightweight and can hardly be used for the reconstruction of the dataset.
no code implementations • 30 Sep 2022 • Zifan Shi, Yinghao Xu, Yujun Shen, Deli Zhao, Qifeng Chen, Dit-yan Yeung
We argue that, considering the two-player game in the formulation of GANs, only making the generator 3D-aware is not enough.
no code implementations • 30 Aug 2022 • Tianjia Zhang, Yuen-Fui Lau, Qifeng Chen
We present a portable multiscopic camera system with a dedicated model for novel view and time synthesis in dynamic scenes.
1 code implementation • 22 Jul 2022 • Ka Leong Cheng, Yueqi Xie, Qifeng Chen
The key is to transform the original noisy images to noise-free bits by eliminating the undesired noise during compression, where the bits are later decompressed as clean images.
1 code implementation • 14 Jul 2022 • Chenyang Qi, Junming Chen, Xin Yang, Qifeng Chen
Recent multi-output inference works propagate the bidirectional temporal feature with a parallel or recurrent framework, which either suffers from performance drops on the temporal edges of clips or can not achieve online inference.
Ranked #1 on Video Denoising on CRVD
1 code implementation • CVPR 2022 • Yue Wu, Qiang Wen, Qifeng Chen
Extensive experiments on the Cityscapes, KITTI, DAVIS, Middlebury, and Vimeo90K datasets show that our video prediction results are robust in general scenarios, and our approach outperforms other video prediction methods that require a large amount of training data or extra semantic information.
2 code implementations • 25 May 2022 • Tengfei Wang, Ting Zhang, Bo Zhang, Hao Ouyang, Dong Chen, Qifeng Chen, Fang Wen
We propose to use pretraining to boost general image-to-image translation.
Ranked #1 on Sketch-to-Image Translation on COCO-Stuff
1 code implementation • 2 May 2022 • Zhili Chen, Zian Qian, Sukai Wang, Qifeng Chen
We present a novel octree-based multi-level framework for large-scale point cloud compression, which can organize sparse and unstructured point clouds in a memory-efficient way.
1 code implementation • 25 Apr 2022 • Hao Ouyang, Bo Zhang, Pan Zhang, Hao Yang, Jiaolong Yang, Dong Chen, Qifeng Chen, Fang Wen
We propose pose-guided multiplane image (MPI) synthesis which can render an animatable character in real scenes with photorealistic quality.
no code implementations • ICCV 2023 • Maosheng Ye, Jiamiao Xu, Xunnong Xu, Tengfei Wang, Tongyi Cao, Qifeng Chen
Also, to model the multi-modality in motion forecasting, we design a novel self-ensembling scheme to obtain accurate teacher targets to enforce the self-constraints with multi-modality supervision.
Ranked #9 on Motion Forecasting on Argoverse CVPR 2020
1 code implementation • CVPR 2022 • Yisheng He, Yao Wang, Haoqiang Fan, Jian Sun, Qifeng Chen
6D object pose estimation networks are limited in their capability to scale to large numbers of object instances due to the close-set assumption and their reliance on high-fidelity object CAD models.
no code implementations • 21 Mar 2022 • Yingqing He, Zhiyi Zhang, Jiapeng Zhu, Yujun Shen, Qifeng Chen
To describe such a phenomenon, we propose channel awareness, which quantitatively characterizes how a single channel contributes to the final synthesis.
no code implementations • 6 Mar 2022 • Yisheng He, Haoqiang Fan, Haibin Huang, Qifeng Chen, Jian Sun
Instead, we propose a label-free method that learns to enforce the geometric consistency between category template mesh and observed object point cloud under a self-supervision manner.
1 code implementation • 19 Feb 2022 • Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen
Despite the rapid advancement of semantic discovery in the latent space of Generative Adversarial Networks (GANs), existing approaches either are limited to finding global attributes or rely on a number of segmentation masks to identify local attributes.
no code implementations • 17 Feb 2022 • Zifan Shi, Yujun Shen, Jiapeng Zhu, Dit-yan Yeung, Qifeng Chen
In this way, the discriminator can take the spatial arrangement into account and advise the generator to learn an appropriate depth condition.
1 code implementation • 27 Jan 2022 • Chenyang Lei, Yazhou Xing, Hao Ouyang, Qifeng Chen
A progressive propagation strategy with pseudo labels is also proposed to enhance DVP's performance on video propagation.
1 code implementation • 11 Jan 2022 • Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Shadow Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung
With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.
1 code implementation • LREC 2022 • Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung
We further conduct experiments with Fairseq S2T Transformer, a state-of-the-art ASR model, on the biggest existing dataset, Common Voice zh-HK, and our proposed MDCC, and the results show the effectiveness of our dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • CVPR 2022 • Chenyang Lei, Chenyang Qi, Jiaxin Xie, Na Fan, Vladlen Koltun, Qifeng Chen
We present a new data-driven approach with physics-based priors to scene-level normal estimation from a single polarization image.
2 code implementations • LREC 2022 • Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Peng Xu, Xu Yan, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung
ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong.
no code implementations • 16 Nov 2021 • Maosheng Ye, Rui Wan, Shuangjie Xu, Tongyi Cao, Qifeng Chen
The Sparse Feature Encoder extracts the local context information for each point, and the Sparse Geometry Feature Enhancement enhances the geometric properties of a sparse point cloud via multi-scale sparse projection and attentive multi-scale fusion.
no code implementations • 4 Nov 2021 • Samruddhi Deshmukh, Amartansh Dubey, Dingfei Ma, Qifeng Chen, Ross Murch
Thus, our proposed method is the first inverse scattering-based deep learning framework which can image large scatterers with high permittivity and achieve accurate indoor RF imaging using phaseless Wi-Fi measurements.
1 code implementation • CVPR 2022 • Tengfei Wang, Yong Zhang, Yanbo Fan, Jue Wang, Qifeng Chen
With a low bit-rate latent code, previous works have difficulties in preserving high-fidelity details in reconstructed and edited images.
1 code implementation • ICCV 2021 • Ka Leong Cheng, Yueqi Xie, Qifeng Chen
Reversible image conversion (RIC) aims to build a reversible transformation between specific visual content (e. g., short videos) and an embedding image, where the original content can be restored from the embedding when necessary.
2 code implementations • ICCV 2021 • Tengfei Wang, Jiaxin Xie, Wenxiu Sun, Qiong Yan, Qifeng Chen
We present a novel approach to reference-based super-resolution (RefSR) with the focus on dual-camera super-resolution (DCSR), which utilizes reference images for high-quality and high-fidelity results.
1 code implementation • ICCV 2021 • Xuanchi Ren, Tao Yang, Li Erran Li, Alexandre Alahi, Qifeng Chen
The ability to predict unseen vehicles is critical for safety in autonomous driving.
1 code implementation • ICCV 2021 • Yue Wu, Guotao Meng, Qifeng Chen
We propose a novel approach for embedding novel views in a single JPEG image while preserving the perceptual fidelity of the modified JPEG image and the restored novel views.
no code implementations • 20 Aug 2021 • Chenyang Lei, Yue Wu, Qifeng Chen
We present a novel approach to automatic image colorization by imitating the imagination process of human experts.
no code implementations • ICCV 2021 • Maosheng Ye, Shuangjie Xu, Tongyi Cao, Qifeng Chen
By utilizing these two modules iteratively, features can be propagated between two different representations.
no code implementations • 8 Aug 2021 • Rongrong Gao, Na Fan, Changlin Li, Wentao Liu, Qifeng Chen
We present a novel approach to joint depth and normal estimation for time-of-flight (ToF) sensors.
1 code implementation • 8 Aug 2021 • Yueqi Xie, Ka Leong Cheng, Qifeng Chen
Although deep learning based image compression methods have achieved promising progress these days, the performance of these methods still cannot match the latest compression standard Versatile Video Coding (VVC).
1 code implementation • 7 Aug 2021 • Yingqing He, Yazhou Xing, Tianjia Zhang, Qifeng Chen
Qualitative and quantitative experiments on a real-world portrait shadow dataset demonstrate that our approach achieves comparable performance with supervised shadow removal methods.
no code implementations • 7 Aug 2021 • Chenyang Lei, Xuhua Huang, Chenyang Qi, Yankun Zhao, Wenxiu Sun, Qiong Yan, Qifeng Chen
Due to the lack of a large-scale reflection removal dataset with diverse real-world scenes, many existing reflection removal methods are trained on synthetic data plus a small amount of real-world data, which makes it difficult to evaluate the strengths or weaknesses of different reflection removal methods thoroughly.
1 code implementation • 7 Aug 2021 • Zifan Shi, Na Fan, Dit-yan Yeung, Qifeng Chen
Thus, we propose a learning-based model for waterdrop removal with stereo images.
no code implementations • 5 Aug 2021 • Weihao Yuan, Rui Fan, Michael Yu Wang, Qifeng Chen
We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation.
1 code implementation • ICCV 2021 • Hao Ouyang, Tengfei Wang, Qifeng Chen
We propose a novel framework for video inpainting by adopting an internal learning strategy.
no code implementations • 24 Jun 2021 • Guotao Meng, Yue Wu, Sijin Li, Qifeng Chen
Existing video super-resolution methods often utilize a few neighboring frames to generate a higher-resolution image for each frame.
1 code implementation • 14 Jun 2021 • Jihyeong Yoo, Qifeng Chen
We train our model on a single image with cascaded multi-scale learning, where each network at each scale is responsible for image reconstruction.
1 code implementation • NeurIPS 2021 • Jiapeng Zhu, Ruili Feng, Yujun Shen, Deli Zhao, ZhengJun Zha, Jingren Zhou, Qifeng Chen
Concretely, given an arbitrary image and a region of interest (e. g., eyes of face images), we manage to relate the latent space to the image region with the Jacobian matrix and then use low-rank factorization to discover steerable latent subspaces.
1 code implementation • CVPR 2021 • Tengfei Wang, Hao Ouyang, Qifeng Chen
Although recent inpainting approaches have demonstrated significant improvements with deep neural networks, they still suffer from artifacts such as blunt structures and abrupt colors when filling in the missing regions.
1 code implementation • CVPR 2021 • Hao Ouyang, Zifan Shi, Chenyang Lei, Ka Lung Law, Qifeng Chen
To facilitate the learning of a simulator model, we collect a dataset of the 10, 000 raw images of 450 scenes with different exposure settings.
no code implementations • 9 Apr 2021 • Weihao Yuan, Yazhan Zhang, Bingkun Wu, Siyu Zhu, Ping Tan, Michael Yu Wang, Qifeng Chen
Self-supervised learning for depth estimation possesses several advantages over supervised learning.
1 code implementation • CVPR 2021 • Yazhou Xing, Zian Qian, Qifeng Chen
Unprocessed RAW data is a highly valuable image format for image editing and computer vision.
13 code implementations • CVPR 2021 • Duo Li, Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen
Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision.
Ranked #706 on Image Classification on ImageNet
1 code implementation • CVPR 2021 • Chenyang Lei, Qifeng Chen
The flash-only image is equivalent to an image taken in a dark environment with only a flash on.
no code implementations • 6 Mar 2021 • Haoran Song, Di Luan, Wenchao Ding, Michael Yu Wang, Qifeng Chen
Predicting the future trajectories of on-road vehicles is critical for autonomous driving.
no code implementations • CVPR 2021 • Maosheng Ye, Tongyi Cao, Qifeng Chen
We propose the Temporal Point Cloud Networks (TPCN), a novel and flexible framework with joint spatial and temporal learning for trajectory prediction.
Ranked #52 on Motion Forecasting on Argoverse CVPR 2020
3 code implementations • CVPR 2021 • Yisheng He, Haibin Huang, Haoqiang Fan, Qifeng Chen, Jian Sun
Moreover, at the output representation stage, we designed a simple but effective 3D keypoints selection algorithm considering the texture and geometry information of objects, which simplifies keypoint localization for precise pose estimation.
Ranked #1 on 6D Pose Estimation on LineMOD
1 code implementation • 10 Feb 2021 • Ching Pui Wan, Qifeng Chen
To the best of our knowledge, our aggregation strategy is the first one that can be adapted to defend against various attacks in a data-driven fashion.
no code implementations • ICCV 2021 • Jingyuan Liu, Mingyi Shi, Qifeng Chen, Hongbo Fu, Chiew-Lan Tai
We present a novel approach for extracting human pose features from human action videos.
1 code implementation • 9 Dec 2020 • Xuanchi Ren, Zian Qian, Qifeng Chen
Our key observation is that some frames in a video with motion blur are much sharper than others, and thus we can transfer the texture information in those sharp frames to blurry frames.
no code implementations • 5 Dec 2020 • Liu Yuezhang, Bo Li, Qifeng Chen
It is well known that artificial neural networks are vulnerable to adversarial examples, in which great efforts have been made to improve the robustness.
2 code implementations • NeurIPS 2020 • Chenyang Lei, Yazhou Xing, Qifeng Chen
Extensive quantitative and perceptual experiments show that our approach obtains superior performance than state-of-the-art methods on blind video temporal consistency.
1 code implementation • 3 Aug 2020 • Weihao Yuan, Michael Yu Wang, Qifeng Chen
Self-supervised learning for visual object tracking possesses valuable advantages compared to supervised learning, such as the non-necessity of laborious human annotations and online training.
no code implementations • ECCV 2020 • Ka Leong Cheng, Zhaoyang Yang, Qifeng Chen, Yu-Wing Tai
Continuous sign language recognition (SLR) is a challenging task that requires learning on both spatial and temporal dimensions of signing frame sequences.
1 code implementation • ECCV 2020 • Duo Li, Anbang Yao, Qifeng Chen
Despite their strong modeling capacities, Convolutional Neural Networks (CNNs) are often scale-sensitive.
no code implementations • ECCV 2020 • Duo Li, Qifeng Chen
In this paper, we build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks and disclose the effectiveness of attention modules more straightforwardly to fully exploit their potential.
1 code implementation • ECCV 2020 • Duo Li, Anbang Yao, Qifeng Chen
To achieve efficient and flexible image classification at runtime, we employ meta learners to generate convolutional weights of main networks for various input scales and maintain privatized Batch Normalization layers per scale.
no code implementations • CVPR 2020 • Kai Zhang, Jiaxin Xie, Noah Snavely, Qifeng Chen
Depth sensing is a critical component of autonomous driving technologies, but today's LiDAR- or stereo camera-based solutions have limited range.
1 code implementation • CVPR 2020 • Yue Wu, Rongrong Gao, Jaesik Park, Qifeng Chen
We present an approach to predict future video frames given a sequence of continuous video frames in the past.
Ranked #2 on Video Prediction on Cityscapes (using extra training data)
1 code implementation • CVPR 2020 • Chenyang Lei, Xuhua Huang, Mengdi Zhang, Qiong Yan, Wenxiu Sun, Qifeng Chen
We present a novel formulation to removing reflection from polarized images in the wild.
1 code implementation • ECCV 2020 • Haoran Song, Wenchao Ding, Yuxuan Chen, Shaojie Shen, Michael Yu Wang, Qifeng Chen
Moreover, our approach enables a novel pipeline which couples the prediction and planning, by conditioning PiP on multiple candidate trajectories of the ego vehicle, which is highly beneficial for autonomous driving in interactive scenarios.
1 code implementation • CVPR 2020 • Duo Li, Qifeng Chen
While the depth of modern Convolutional Neural Networks (CNNs) surpasses that of the pioneering networks with a significant margin, the traditional way of appending supervision only over the final classifier and progressively propagating gradient flow upstream remains the training mainstay.
1 code implementation • 22 Jan 2020 • Weihao Yuan, Rui Fan, Michael Yu Wang, Qifeng Chen
We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation for robotic applications.
1 code implementation • 30 Dec 2019 • Jiaxin Xie, Chenyang Lei, Zhuwen Li, Li Erran Li, Qifeng Chen
Our flow-to-depth layer is differentiable, and thus we can refine camera poses by maximizing the aggregated confidence in the camera pose refinement module.
2 code implementations • 24 Dec 2019 • Shuhao Fu, Chulin Xie, Bo Li, Qifeng Chen
Federated learning has a variety of applications in multiple domains by utilizing private training data stored on different devices.
1 code implementation • 13 Dec 2019 • Xuanchi Ren, Haoran Li, Zijian Huang, Qifeng Chen
We present a learning-based approach with pose perceptual loss for automatic music video generation.
4 code implementations • CVPR 2019 • Chenyang Lei, Qifeng Chen
We present a fully automatic approach to video colorization with self-regularization and diversity.
1 code implementation • 13 May 2019 • Xuaner Cecilia Zhang, Qifeng Chen, Ren Ng, Vladlen Koltun
We show how to obtain the ground-truth data with optically zoomed images and contribute a dataset, SR-RAW, for real-world computational zoom.
2 code implementations • NeurIPS 2018 • Zhuwen Li, Qifeng Chen, Vladlen Koltun
We present a learning-based approach to computing solutions for certain NP-hard problems.
2 code implementations • Interspeech 2018 • Francois G. Germain, Qifeng Chen, Vladlen Koltun
We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly.
5 code implementations • 27 Jun 2018 • Francois G. Germain, Qifeng Chen, Vladlen Koltun
We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly.
3 code implementations • CVPR 2018 • Xuaner Zhang, Ren Ng, Qifeng Chen
Our loss function includes two perceptual losses: a feature loss from a visual perception network, and an adversarial loss that encodes characteristics of images in the transmission layers.
1 code implementation • CVPR 2018 • Zhuwen Li, Qifeng Chen, Vladlen Koltun
The first is trained to synthesize a diverse set of plausible segmentations that conform to the user's input.
Ranked #10 on Interactive Segmentation on SBD
19 code implementations • CVPR 2018 • Chen Chen, Qifeng Chen, Jia Xu, Vladlen Koltun
Imaging in low light is challenging due to low photon count and low SNR.
Ranked #4 on Image Denoising on SID x300
1 code implementation • CVPR 2018 • Xiaojuan Qi, Qifeng Chen, Jiaya Jia, Vladlen Koltun
We present a semi-parametric approach to photographic image synthesis from semantic layouts.
2 code implementations • ICCV 2017 • Qifeng Chen, Jia Xu, Vladlen Koltun
Our approach uses a fully-convolutional network that is trained on input-output pairs that demonstrate the operator's action.
no code implementations • ICCV 2017 • Qifeng Chen, Vladlen Koltun
We present an approach to synthesizing photographic images conditioned on semantic layouts.
no code implementations • CVPR 2016 • Rene Ranftl, Vibhav Vineet, Qifeng Chen, Vladlen Koltun
We present an approach to dense depth estimation from a single monocular camera that is moving through a dynamic scene.
no code implementations • CVPR 2016 • Qifeng Chen, Vladlen Koltun
The approach optimizes a classical optical flow objective over the full space of mappings between discrete grids.
no code implementations • ICCV 2015 • Qifeng Chen, Vladlen Koltun
We present an approach to nonrigid registration of 3D surfaces.
no code implementations • 22 Sep 2014 • Cewu Lu, Hao Chen, Qifeng Chen, Hei Law, Yao Xiao, Chi-Keung Tang
We participated in the object detection track of ILSVRC 2014 and received the fourth place among the 38 teams.
no code implementations • CVPR 2014 • Qifeng Chen, Vladlen Koltun
We describe a simple and fast algorithm for optimizing Markov random fields over images.