Search Results for author: Chaoyue Wang

Found 49 papers, 24 papers with code

Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm

no code implementations18 Mar 2024 Yi Wu, Ziqiang Li, Heliang Zheng, Chaoyue Wang, Bin Li

Drawing on recent advancements in diffusion models for text-to-image generation, identity-preserved personalization has made significant progress in accurately capturing specific identities with just a single reference image.

Text-to-Image Generation

When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability

no code implementations1 Mar 2024 Wenjie Xuan, Yufei Xu, Shanshan Zhao, Chaoyue Wang, Juhua Liu, Bo Du, DaCheng Tao

Subsequently, to enhance controllability with inexplicit masks, an advanced Shape-aware ControlNet consisting of a deterioration estimator and a shape-prior modulation block is devised.

Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping

1 code implementation29 Feb 2024 Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, DaCheng Tao, Tat-Jen Cham

Consequently, we introduce Trajectory Consistency Distillation (TCD), which encompasses trajectory consistency function and strategic stochastic sampling.

Image Generation

HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

1 code implementation29 Nov 2023 Wenquan Lu, Yufei Xu, Jing Zhang, Chaoyue Wang, DaCheng Tao

Given a generated failed image due to malformed hands, we utilize ControlNet modules to re-inject such correct hand information.

One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls

no code implementations27 Nov 2023 Minghui Hu, Jianbin Zheng, Chuanxia Zheng, Chaoyue Wang, DaCheng Tao, Tat-Jen Cham

By integrating a compact network and incorporating an additional simple yet effective step during inference, OMS elevates image fidelity and harmonizes the dichotomy between training and inference, while preserving original model parameters.

Denoising

Decompose Semantic Shifts for Composed Image Retrieval

no code implementations18 Sep 2023 Xingyu Yang, Daqing Liu, Heng Zhang, Yong Luo, Chaoyue Wang, Jing Zhang

Composed image retrieval is a type of image retrieval task where the user provides a reference image as a starting point and specifies a text on how to shift from the starting point to the desired target image.

Image Retrieval Retrieval

PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning

no code implementations24 Aug 2023 Mengya Han, Heliang Zheng, Chaoyue Wang, Yong Luo, Han Hu, Jing Zhang, Yonggang Wen

In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples.

Language Modelling Segmentation

Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic Segmentation

no code implementations5 Aug 2023 Yiyang Chen, Shanshan Zhao, Changxing Ding, Liyao Tang, Chaoyue Wang, DaCheng Tao

In recent years, cross-modal domain adaptation has been studied on the paired 2D image and 3D LiDAR data to ease the labeling costs for 3D LiDAR semantic segmentation (3DLSS) in the target domain.

Domain Adaptation LIDAR Semantic Segmentation +1

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

no code implementations1 Jun 2023 Minghui Hu, Jianbin Zheng, Daqing Liu, Chuanxia Zheng, Chaoyue Wang, DaCheng Tao, Tat-Jen Cham

In this work, we propose Cocktail, a pipeline to mix various modalities into one embedding, amalgamated with a generalized ControlNet (gControlNet), a controllable normalisation (ControlNorm), and a spatial guidance sampling method, to actualize multi-modal and spatially-refined control for text-conditional diffusion models.

Conditional Image Generation

Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator

no code implementations11 May 2023 Jing Zhao, Heliang Zheng, Chaoyue Wang, Long Lan, Wanrong Huang, Wenjing Yang

Specifically, we proposed two disturbance methods, i. e., Rollback disturbance (Back-D) and Image disturbance (Image-D), to construct misalignment between the noisy images used for predicting null-text guidance and text guidance (subsequently referred to as \textbf{null-text noisy image} and \textbf{text noisy image} respectively) in the sampling process.

MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis

no code implementations10 May 2023 Jianbin Zheng, Daqing Liu, Chaoyue Wang, Minghui Hu, Zuopeng Yang, Changxing Ding, DaCheng Tao

To this end, we propose to generate images conditioned on the compositions of multimodal control signals, where modalities are imperfectly complementary, i. e., composed multimodal conditional image synthesis (CMCIS).

Image Generation

MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models

no code implementations ICCV 2023 Jing Zhao, Heliang Zheng, Chaoyue Wang, Long Lan, Wenjing Yang

The advent of open-source AI communities has produced a cornucopia of powerful text-guided diffusion models that are trained on various datasets.

Text-to-Image Generation

ESceme: Vision-and-Language Navigation with Episodic Scene Memory

1 code implementation2 Mar 2023 Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, DaCheng Tao

Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes.

Vision and Language Navigation

Eliminating Contextual Prior Bias for Semantic Image Editing via Dual-Cycle Diffusion

1 code implementation5 Feb 2023 Zuopeng Yang, Tianshu Chu, Xin Lin, Erdun Gao, Daqing Liu, Jie Yang, Chaoyue Wang

The proposed model incorporates a Bias Elimination Cycle that consists of both a forward path and an inverted path, each featuring a Structural Consistency Cycle to ensure the preservation of image content during the editing process.

Text-to-Image Generation

Diff-Font: Diffusion Model for Robust One-Shot Font Generation

1 code implementation12 Dec 2022 Haibin He, Xinyuan Chen, Chaoyue Wang, Juhua Liu, Bo Du, DaCheng Tao, Yu Qiao

Specifically, a large stroke-wise dataset is constructed, and a stroke-wise diffusion model is proposed to preserve the structure and the completion of each generated character.

Font Generation

Unified Discrete Diffusion for Simultaneous Vision-Language Generation

1 code implementation27 Nov 2022 Minghui Hu, Chuanxia Zheng, Heliang Zheng, Tat-Jen Cham, Chaoyue Wang, Zuopeng Yang, DaCheng Tao, Ponnuthurai N. Suganthan

The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals.

multimodal generation Text Generation +1

Cross-Modal Contrastive Learning for Robust Reasoning in VQA

1 code implementation21 Nov 2022 Qi Zheng, Chaoyue Wang, Daqing Liu, Dadong Wang, DaCheng Tao

For each positive pair, we regard the images from different graphs as negative samples and deduct the version of multi-positive contrastive learning.

Contrastive Learning Question Answering +1

Leveraging GAN Priors for Few-Shot Part Segmentation

1 code implementation27 Jul 2022 Mengya Han, Heliang Zheng, Chaoyue Wang, Yong Luo, Han Hu, Bo Du

Overall, this work is an attempt to explore the internal relevance between generation tasks and perception tasks by prompt designing.

Image Generation Segmentation

FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs

1 code implementation18 Jul 2022 Ziqiang Li, Chaoyue Wang, Heliang Zheng, Jing Zhang, Bin Li

Since data augmentation strategies have largely alleviated the training instability, how to further improve the generative performance of DE-GANs becomes a hotspot.

Contrastive Learning Data Augmentation

SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

1 code implementation21 Jun 2022 Gang Li, Heliang Zheng, Daqing Liu, Chaoyue Wang, Bing Su, Changwen Zheng

In this paper, we explore a potential visual analogue of words, i. e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy.

Language Modelling Masked Language Modeling +1

Bypass Network for Semantics Driven Image Paragraph Captioning

no code implementations21 Jun 2022 Qi Zheng, Chaoyue Wang, Dadong Wang

Most existing methods model the coherence through the topic transition that dynamically infers a topic vector from preceding sentences.

Image Paragraph Captioning Sentence

Recent Advances for Quantum Neural Networks in Generative Learning

no code implementations7 Jun 2022 Jinkai Tian, Xiaoyu Sun, Yuxuan Du, Shanshan Zhao, Qing Liu, Kaining Zhang, Wei Yi, Wanrong Huang, Chaoyue Wang, Xingyao Wu, Min-Hsiu Hsieh, Tongliang Liu, Wenjing Yang, DaCheng Tao

Due to the intrinsic probabilistic nature of quantum mechanics, it is reasonable to postulate that quantum generative learning models (QGLMs) may surpass their classical counterparts.

BIG-bench Machine Learning Quantum Machine Learning

Modeling Image Composition for Complex Scene Generation

1 code implementation CVPR 2022 Zuopeng Yang, Daqing Liu, Chaoyue Wang, Jie Yang, DaCheng Tao

Compared to existing CNN-based and Transformer-based generation models that entangled modeling on pixel-level&patch-level and object-level&patch-level respectively, the proposed focal attention predicts the current patch token by only focusing on its highly-related tokens that specified by the spatial layout, thereby achieving disambiguation during training.

Layout-to-Image Generation Object +1

Visual Superordinate Abstraction for Robust Concept Learning

no code implementations28 May 2022 Qi Zheng, Chaoyue Wang, Dadong Wang, DaCheng Tao

Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks.

Attribute Question Answering +1

Neural Maximum A Posteriori Estimation on Unpaired Data for Motion Deblurring

1 code implementation26 Apr 2022 Youjian Zhang, Chaoyue Wang, DaCheng Tao

The proposed NeurMAP is an orthogonal approach to existing deblurring neural networks, and is the first framework that enables training image deblurring networks on unpaired datasets.

Deblurring Image Deblurring +1

A Comprehensive Survey on Data-Efficient GANs in Image Generation

no code implementations18 Apr 2022 Ziqiang Li, Beihao Xia, Jing Zhang, Chaoyue Wang, Bin Li

Generative Adversarial Networks (GANs) have achieved remarkable achievements in image synthesis.

Image Generation

BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning

1 code implementation4 Apr 2022 Zhi Hou, Baosheng Yu, Chaoyue Wang, Yibing Zhan, DaCheng Tao

Specifically, when applying the proposed module, it employs a two-stream pipeline during training, i. e., either with or without a BatchFormerV2 module, where the batchformer stream can be removed for testing.

Image Classification object-detection +3

Self-Augmented Unpaired Image Dehazing via Density and Depth Decomposition

1 code implementation CVPR 2022 Yang Yang, Chaoyue Wang, Risheng Liu, Lin Zhang, Xiaojie Guo, DaCheng Tao

With estimated scene depth, our method is capable of re-rendering hazy images with different thicknesses which further benefits the training of the dehazing network.

Image Dehazing

Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition

1 code implementation AAAI 2022 2021 Yue He, Chen Chen, Jing Zhang, Juhua Liu, Fengxiang He, Chaoyue Wang, Bo Du

Technically, given the character segmentation maps predicted by a VR model, we construct a subgraph for each instance, where nodes represent the pixels in it and edges are added between nodes based on their spatial similarity.

Ranked #9 on Scene Text Recognition on ICDAR2015 (using extra training data)

Language Modelling Scene Text Recognition

Video Frame Interpolation without Temporal Priors

1 code implementation NeurIPS 2020 Youjian Zhang, Chaoyue Wang, DaCheng Tao

However, in complicated real-world situations, the temporal priors of videos, i. e. frames per second (FPS) and frame exposure time, may vary from different camera sensors.

Optical Flow Estimation Video Frame Interpolation

TAG: Toward Accurate Social Media Content Tagging with a Concept Graph

no code implementations13 Oct 2021 Jiuding Yang, Weidong Guo, Bang Liu, Yakun Yu, Chaoyue Wang, Jinwen Luo, Linglong Kong, Di Niu, Zhen Wen

Although conceptualization has been widely studied in semantics and knowledge representation, it is still challenging to find the most accurate concept phrases to characterize the main idea of a text snippet on the fast-growing social media.

Dependency Parsing Graph Matching +4

MRI-based Alzheimer's disease prediction via distilling the knowledge in multi-modal data

no code implementations8 Apr 2021 Hao Guan, Chaoyue Wang, DaCheng Tao

In this work, we propose a multi-modal multi-instance distillation scheme, which aims to distill the knowledge learned from multi-modal data to an MRI-based network for MCI conversion prediction.

Disease Prediction

Exposure Trajectory Recovery from Motion Blur

1 code implementation6 Oct 2020 Youjian Zhang, Chaoyue Wang, Stephen J. Maybank, DaCheng Tao

However, the motion information contained in a blurry image has yet to be fully explored and accurately formulated because: (i) the ground truth of dynamic motion is difficult to obtain; (ii) the temporal ordering is destroyed during the exposure; and (iii) the motion estimation from a blurry image is highly ill-posed.

Deblurring Image Deblurring +1

A Systematic Survey of Regularization and Normalization in GANs

1 code implementation19 Aug 2020 Ziqiang Li, Muhammad Usman, Rentuo Tao, Pengfei Xia, Chaoyue Wang, Huanhuan Chen, Bin Li

Although a handful number of regularization and normalization methods have been proposed for GANs, to the best of our knowledge, there exists no comprehensive survey that primarily focuses on objectives and development of these methods, apart from some in-comprehensive and limited scope studies.

Data Augmentation

GIANT: Scalable Creation of a Web-scale Ontology

1 code implementation5 Apr 2020 Bang Liu, Weidong Guo, Di Niu, Jinwen Luo, Chaoyue Wang, Zhen Wen, Yu Xu

These services will benefit from a highly structured and web-scale ontology of entities, concepts, events, topics and categories.

News Recommendation

A User-Centered Concept Mining System for Query and Document Understanding at Tencent

no code implementations21 May 2019 Bang Liu, Weidong Guo, Di Niu, Chaoyue Wang, Shunnan Xu, Jinghong Lin, Kunfeng Lai, Yu Xu

We further present our techniques to tag documents with user-centered concepts and to construct a topic-concept-instance taxonomy, which has helped to improve search as well as news feeds recommendation in Tencent QQ Browser.

document understanding TAG

Multiple Sclerosis Lesion Inpainting Using Non-Local Partial Convolutions

no code implementations24 Dec 2018 Hao Xiong, Chaoyue Wang, DaCheng Tao, Michael Barnett, Chenyu Wang

However, existing methods inpaint lesions based on texture information derived from local surrounding tissue, often leading to inconsistent inpainting and the generation of artifacts such as intensity discrepancy and blurriness.

Evolutionary Generative Adversarial Networks

3 code implementations1 Mar 2018 Chaoyue Wang, Chang Xu, Xin Yao, DaCheng Tao

In this paper, we propose a novel GAN framework called evolutionary generative adversarial networks (E-GAN) for stable GAN training and improved generative performance.

Perceptual Adversarial Networks for Image-to-Image Transformation

2 code implementations28 Jun 2017 Chaoyue Wang, Chang Xu, Chaohui Wang, DaCheng Tao

The proposed PAN consists of two feed-forward convolutional neural networks (CNNs), the image transformation network T and the discriminative network D. Through combining the generative adversarial loss and the proposed perceptual adversarial loss, these two networks can be trained alternately to solve image-to-image transformation tasks.

Image Inpainting

Tag Disentangled Generative Adversarial Networks for Object ImageRe-rendering

no code implementations International Joint Conference on Artificial Intelligence 2017 Chaoyue Wang, Chaohui Wang, Chang Xu, DaCheng Tao

The whole framework consists of a disentangling network, a generative network, a tag mapping net, and a discriminative network, which are trained jointly based on a given set of images that are complete/partially tagged(i. e., supervised/semi-supervised setting).

Object TAG

Cannot find the paper you are looking for? You can Submit a new open access paper.