Search Results for author: Songcen Xu

Found 36 papers, 17 papers with code

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

1 code implementation • 16 Apr 2024 • Bin Ren, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang, Wei Zhai, Renjing Pei, Jiaming Guo, Songcen Xu, Yang Cao, ZhengJun Zha, Yan Wang, Yi Liu, Qing Wang, Gang Zhang, Liou Zhang, Shijie Zhao, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Xin Liu, Min Yan, Menghan Zhou, Yiqiang Yan, Yixuan Liu, Wensong Chan, Dehua Tang, Dong Zhou, Li Wang, Lu Tian, Barsoum Emad, Bohan Jia, Junbo Qiao, Yunshuai Zhou, Yun Zhang, Wei Li, Shaohui Lin, Shenglong Zhou, Binbin Chen, Jincheng Liao, Suiyi Zhao, Zhao Zhang, Bo wang, Yan Luo, Yanyan Wei, Feng Li, Mingshen Wang, Yawei Li, Jinhan Guan, Dehua Hu, Jiawei Yu, Qisheng Xu, Tao Sun, Long Lan, Kele Xu, Xin Lin, Jingtong Yue, Lehan Yang, Shiyi Du, Lu Qi, Chao Ren, Zeyu Han, YuHan Wang, Chaolin Chen, Haobo Li, Mingjun Zheng, Zhongbao Yang, Lianhong Song, Xingzhuo Yan, Minghan Fu, Jingyi Zhang, Baiang Li, Qi Zhu, Xiaogang Xu, Dan Guo, Chunle Guo, Jiadi Chen, Huanhuan Long, Chunjiang Duanmu, Xiaoyan Lei, Jie Liu, Weilin Jia, Weifeng Cao, Wenlong Zhang, Yanyu Mao, Ruilong Guo, Nihao Zhang, Qian Wang, Manoj Pandey, Maksym Chernozhukov, Giang Le, Shuli Cheng, Hongyuan Wang, Ziyan Wei, Qingting Tang, Liejun Wang, Yongming Li, Yanhui Guo, Hao Xu, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi

In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking.

Image Super-Resolution

Paper
Code

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

1 code implementation • 2 Apr 2024 • Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu

While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work.

Video Generation

Paper
Code

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

no code implementations • 25 Mar 2024 • Qingping Zheng, Ling Zheng, Yuanfan Guo, Ying Li, Songcen Xu, Jiankang Deng, Hang Xu

Following this, the Reality Guidance Refinement (RGR) process refines artifacts by integrating this mask with realistic latent representations, improving alignment with the original image.

Super-Resolution

Paper
Add Code

LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model

no code implementations • 18 Mar 2024 • Runhui Huang, Kaixin Cai, Jianhua Han, Xiaodan Liang, Renjing Pei, Guansong Lu, Songcen Xu, Wei zhang, Hang Xu

Specifically, an inter-layer attention module is designed to encourage information exchange and learning between layers, while a text-guided intra-layer attention module incorporates layer-specific prompts to direct the specific-content generation for each layer.

Image Generation Style Transfer

Paper
Add Code

VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction

no code implementations • 27 Feb 2024 • Jiaqi Lin, Zhihao LI, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, Wenming Yang

Existing NeRF-based methods for large scene reconstruction often have limitations in visual quality and rendering speed.

Paper
Add Code

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

no code implementations • 27 Dec 2023 • Guansong Lu, Yuanfan Guo, Jianhua Han, Minzhe Niu, Yihan Zeng, Songcen Xu, Zeyi Huang, Zhao Zhong, Wei zhang, Hang Xu

Current large-scale diffusion models represent a giant leap forward in conditional image synthesis, capable of interpreting diverse cues like text, human poses, and edges.

Computational Efficiency Denoising +1

Paper
Add Code

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

1 code implementation • 11 Dec 2023 • Tianyu Huang, Yihan Zeng, Zhilu Zhang, Wan Xu, Hang Xu, Songcen Xu, Rynson W. H. Lau, WangMeng Zuo

The priors are then regarded as input conditions to maintain reasonable geometries, in which conditional LoRA and weighted score are further proposed to optimize detailed textures.

3D Generation Text to 3D

Paper
Code

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

1 code implementation • 5 Dec 2023 • Fengyuan Shi, Jiaxi Gu, Hang Xu, Songcen Xu, Wei zhang, LiMin Wang

Now text-to-image foundation models are widely applied to various downstream image synthesis tasks, such as controllable image generation and image editing, while downstream video synthesis tasks are less explored for several reasons.

Image Generation Model Selection +3

Paper
Code

DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance

no code implementations • 5 Dec 2023 • Cong Wang, Jiaxi Gu, Panwen Hu, Songcen Xu, Hang Xu, Xiaodan Liang

Especially for fidelity, our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.

Image to Video Generation

Paper
Add Code

Semantics-aware Motion Retargeting with Vision-Language Models

no code implementations • 4 Dec 2023 • Haodong Zhang, ZhiKe Chen, Haocheng Xu, Lei Hao, Xiaofei Wu, Songcen Xu, Zhensong Zhang, Yue Wang, Rong Xiong

Capturing and preserving motion semantics is essential to motion retargeting between animation characters.

Language Modelling motion retargeting

Paper
Add Code

Learning Unorthogonalized Matrices for Rotation Estimation

no code implementations • 1 Dec 2023 • Kerui Gu, Zhihao LI, Shiyong Liu, Jianzhuang Liu, Songcen Xu, Youliang Yan, Michael Bi Mi, Kenji Kawaguchi, Angela Yao

Estimating 3D rotations is a common procedure for 3D computer vision.

Pose Estimation

Paper
Add Code

Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models

no code implementations • 25 Oct 2023 • Tianyi Lu, Xing Zhang, Jiaxi Gu, Hang Xu, Renjing Pei, Songcen Xu, Zuxuan Wu

In this way, temporal consistency can be kept with video LDM while high-fidelity from the image LDM can also be exploited.

Denoising Video Editing

Paper
Add Code

TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

no code implementations • 29 Sep 2023 • Tianyu Huang, Yihan Zeng, Bowen Dong, Hang Xu, Songcen Xu, Rynson W. H. Lau, WangMeng Zuo

To this end, an NTFGen module is proposed to model general text latent code in noisy fields.

3D Generation

Paper
Add Code

UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons

1 code implementation • 13 Sep 2023 • Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai

The automatic co-speech gesture generation draws much attention in computer animation.

Gesture Generation

Paper
Code

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

no code implementations • 7 Sep 2023 • Jiaxi Gu, Shicong Wang, Haoyu Zhao, Tianyi Lu, Xing Zhang, Zuxuan Wu, Songcen Xu, Wei zhang, Yu-Gang Jiang, Hang Xu

Conditioned on an initial video clip with a small number of frames, additional frames are iteratively generated by reusing the original latent features and following the previous diffusion process.

Action Recognition Denoising +3

Paper
Add Code

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

no code implementations • 31 Aug 2023 • Qingping Zheng, Yuanfan Guo, Jiankang Deng, Jianhua Han, Ying Li, Songcen Xu, Hang Xu

Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes.

Image Generation

Paper
Add Code

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

no code implementations • ICCV 2023 • Cuican Yu, Guansong Lu, Yihan Zeng, Jian Sun, Xiaodan Liang, Huibin Li, Zongben Xu, Songcen Xu, Wei zhang, Hang Xu

In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D faces using text guidance.

3D Shape Generation Contrastive Learning +2

Paper
Add Code

The DiffuseStyleGesture+ entry to the GENEA Challenge 2023

1 code implementation • 26 Aug 2023 • Sicheng Yang, Haiwei Xue, Zhensong Zhang, Minglei Li, Zhiyong Wu, Xiaofei Wu, Songcen Xu, Zonghong Dai

In this paper, we introduce the DiffuseStyleGesture+, our solution for the Generation and Evaluation of Non-verbal Behavior for Embodied Agents (GENEA) Challenge 2023, which aims to foster the development of realistic, automated systems for generating conversational gestures.

121

Paper
Code

Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network

no code implementations • ICCV 2023 • Yinglong Wang, Zhen Liu, Jianzhuang Liu, Songcen Xu, Shuaicheng Liu

We propose to integrate the effectiveness of gamma correction with the strong modelling capacities of deep networks, which enables the correction factor gamma to be learned in a coarse to elaborate manner via adaptively perceiving the deviated illumination.

Low-Light Image Enhancement

Paper
Add Code

AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning

no code implementations • CVPR 2023 • Runqi Wang, Xiaoyue Duan, Guoliang Kang, Jianzhuang Liu, Shaohui Lin, Songcen Xu, Jinhu Lv, Baochang Zhang

Text consists of a category name and a fixed number of learnable parameters which are selected from our designed attribute word bank and serve as attributes.

Attribute Continual Learning +1

Paper
Add Code

Few-Shot Learning with Visual Distribution Calibration and Cross-Modal Distribution Alignment

1 code implementation • CVPR 2023 • Runqi Wang, Hao Zheng, Xiaoyue Duan, Jianzhuang Liu, Yuning Lu, Tian Wang, Songcen Xu, Baochang Zhang

However, with only a few training images, there exist two crucial problems: (1) the visual feature distributions are easily distracted by class-irrelevant information in images, and (2) the alignment between the visual and language feature distributions is difficult.

Few-Shot Learning

Paper
Code

HiVLP: Hierarchical Interactive Video-Language Pre-Training

no code implementations • ICCV 2023 • Bin Shao, Jianzhuang Liu, Renjing Pei, Songcen Xu, Peng Dai, Juwei Lu, Weimian Li, Youliang Yan

However, compared to image-language pre-training, VLP has lagged far behind due to the lack of large amounts of video-text pairs.

Retrieval Self-Supervised Learning +3

Paper
Add Code

PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval

no code implementations • ICCV 2023 • Peiyan Guan, Renjing Pei, Bin Shao, Jianzhuang Liu, Weimian Li, Jiaxi Gu, Hang Xu, Songcen Xu, Youliang Yan, Edmund Y. Lam

The parallel isomeric attention module is used as the video encoder, which consists of two parallel branches modeling the spatial-temporal information of videos from both patch and frame levels.

Ranked #3 on Video Retrieval on MSR-VTT-1kA

Representation Learning Retrieval +3

Paper
Add Code

CLIPPING: Distilling CLIP-Based Models With a Student Base for Video-Language Retrieval

no code implementations • CVPR 2023 • Renjing Pei, Jianzhuang Liu, Weimian Li, Bin Shao, Songcen Xu, Peng Dai, Juwei Lu, Youliang Yan

Pre-training a vison-language model and then fine-tuning it on downstream tasks have become a popular paradigm.

Knowledge Distillation Language Modelling +1

Paper
Add Code

Co-Speech Gesture Synthesis by Reinforcement Learning With Contrastive Pre-Trained Rewards

no code implementations • CVPR 2023 • Mingyang Sun, Mengchen Zhao, Yaqing Hou, Minglei Li, Huang Xu, Songcen Xu, Jianye Hao

There is a growing demand of automatically synthesizing co-speech gestures for virtual characters.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation

6 code implementations • 1 Aug 2022 • Zhihao LI, Jianzhuang Liu, Zhensong Zhang, Songcen Xu, Youliang Yan

Top-down methods dominate the field of 3D human pose and shape estimation, because they are decoupled from human detection and allow researchers to focus on the core problem.

Ranked #1 on Unsupervised 3D Human Pose Estimation on Human3.6M (PA-MPJPE metric)

3D human pose and shape estimation Human Detection +1

835

Paper
Code

Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

1 code implementation • ICCV 2021 • Zhihao Liang, Zhihao LI, Songcen Xu, Mingkui Tan, Kui Jia

State-of-the-art methods largely rely on a general pipeline that first learns point-wise features discriminative at semantic and instance levels, followed by a separate step of point grouping for proposing object instances.

Ranked #9 on 3D Instance Segmentation on S3DIS

3D Instance Segmentation Scene Understanding +1

Paper
Code

Multiple instance active learning for object detection

1 code implementation • CVPR 2021 • Tianning Yuan, Fang Wan, Mengying Fu, Jianzhuang Liu, Songcen Xu, Xiangyang Ji, Qixiang Ye

Despite the substantial progress of active learning for image recognition, there still lacks an instance-level active learning method specified for object detection.

Ranked #1 on Active Object Detection on MS COCO

Active Object Detection Multiple Instance Learning +3

325

Paper
Code

Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE

2 code implementations • CVPR 2021 • Jialun Peng, Dong Liu, Songcen Xu, Houqiang Li

We propose a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture.

Image Inpainting Quantization +1

172

Paper
Code

DualPoseNet: Category-level 6D Object Pose and Size Estimation Using Dual Pose Network with Refined Learning of Pose Consistency

1 code implementation • ICCV 2021 • Jiehong Lin, Zewei Wei, Zhihao LI, Songcen Xu, Kui Jia, Yuanqing Li

DualPoseNet stacks two parallel pose decoders on top of a shared pose encoder, where the implicit decoder predicts object poses with a working mechanism different from that of the explicit one; they thus impose complementary supervision on the training of pose encoder.

Ranked #4 on 6D Pose Estimation using RGBD on REAL275

6D Pose Estimation using RGBD Object +1

Paper
Code

Quality-Aware Network for Human Parsing

1 code implementation • 10 Mar 2021 • Lu Yang, Qing Song, Zhihui Wang, Zhiwei Liu, Songcen Xu, Zhihao LI

How to estimate the quality of the network output is an important issue, and currently there is no effective solution in the field of human parsing.

Human Parsing Instance Segmentation +1

Paper
Code

PcmNet: Position-Sensitive Context Modeling Network for Temporal Action Localization

no code implementations • 9 Mar 2021 • Xin Qin, Hanbin Zhao, Guangchen Lin, Hao Zeng, Songcen Xu, Xi Li

In this paper, we propose a temporal-position-sensitive context modeling approach to incorporate both positional and semantic information for more precise action localization.

Boundary Detection Position +3

Paper
Add Code

Renovating Parsing R-CNN for Accurate Multiple Human Parsing

1 code implementation • ECCV 2020 • Lu Yang, Qing Song, Zhihui Wang, Mengjie Hu, Chun Liu, Xueshi Xin, Wenhe Jia, Songcen Xu

Multiple human parsing aims to segment various human parts and associate each part with the corresponding instance simultaneously.

Human Parsing

Paper
Code

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data

2 code implementations • 3 Feb 2020 • Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, Dou Renyin

Compared with previous learning objectives, i. e., learning metric depth or relative depth, we propose to learn the affine-invariant depth using our diverse dataset to ensure both generalization and high-quality geometric shapes of scenes.

Depth Estimation Depth Prediction

217

Paper
Code

Index Network

6 code implementations • 11 Aug 2019 • Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu

By viewing the indices as a function of the feature map, we introduce the concept of "learning to index", and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the downsampling and upsampling stages, without extra training supervision.

Ranked #2 on Grayscale Image Denoising on Set12 sigma30

Grayscale Image Denoising Image Denoising +3

383

Paper
Code

Indices Matter: Learning to Index for Deep Image Matting

1 code implementation • ICCV 2019 • Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu

We show that existing upsampling operators can be unified with the notion of the index function.

Ranked #4 on Semantic Image Matting on Semantic Image Matting Dataset

Semantic Image Matting

383

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.