Search Results for author: Huaibo Huang

Found 45 papers, 12 papers with code

Hierarchical Face Aging through Disentangled Latent Characteristics

no code implementations • ECCV 2020 • Pei-Pei Li, Huaibo Huang, Yibo Hu, Xiang Wu, Ran He, Zhenan Sun

To explore the age effects on facial images, we propose a Disentangled Adversarial Autoencoder (DAAE) to disentangle the facial images into three independent factors: age, identity and extraneous information.

Age Estimation MORPH

Paper
Add Code

ViTAR: Vision Transformer with Any Resolution

no code implementations • 27 Mar 2024 • Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

Firstly, we propose a novel module for dynamic resolution adjustment, designed with a single Transformer block, specifically to achieve highly efficient incremental token integration.

Self-Supervised Learning Semantic Segmentation

Paper
Add Code

DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration

no code implementations • 15 Mar 2024 • Nan Gao, Jia Li, Huaibo Huang, Zhi Zeng, Ke Shang, Shuwu Zhang, Ran He

Experimental results demonstrate the superiority of DiffMAC over state-of-the-art methods, with a high degree of generalization in real-world and heterogeneous settings.

Attribute Blind Face Restoration +1

Paper
Add Code

FakeNewsGPT4: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs

no code implementations • 4 Mar 2024 • Xuannan Liu, Peipei Li, Huaibo Huang, Zekun Li, Xing Cui, Jiahao Liang, Lixiong Qin, Weihong Deng, Zhaofeng He

In this paper, we propose FakeNewsGPT4, a novel framework that augments Large Vision-Language Models (LVLMs) with forgery-specific knowledge for manipulation reasoning while inheriting extensive world knowledge as complementary.

Fake News Detection Informativeness +1

Paper
Add Code

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

no code implementations • 3 Mar 2024 • Haogeng Liu, Quanzeng You, Xiaotian Han, Yiqi Wang, Bohan Zhai, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

Multimodal Large Language Models (MLLMs) have experienced significant advancements recently.

Ranked #37 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Add Code

Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration

no code implementations • 5 Dec 2023 • Yuang Ai, Huaibo Huang, Xiaoqiang Zhou, Jiexiang Wang, Ran He

Extensive experiments on 16 IR tasks underscore the superiority of MPerceiver in terms of adaptiveness, generalizability and fidelity.

Image Restoration

Paper
Add Code

Portrait Diffusion: Training-free Face Stylization with Chain-of-Painting

1 code implementation • 3 Dec 2023 • Jin Liu, Huaibo Huang, Chao Jin, Ran He

Face stylization refers to the transformation of a face into a specific portrait style.

Image Reconstruction

Paper
Code

Exploring Straighter Trajectories of Flow Matching with Diffusion Guidance

no code implementations • 28 Nov 2023 • Siyu Xing, Jie Cao, Huaibo Huang, Xiao-Yu Zhang, Ran He

First, we propose a coupling strategy to straighten trajectories, creating couplings between image and noise samples under diffusion model guidance.

Paper
Add Code

InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser

no code implementations • 25 Nov 2023 • Xing Cui, Zekun Li, Pei Pei Li, Huaibo Huang, Zhaofeng He

We employ DDIM inversion to extract this noise from the reference image and leverage a diffusion model to generate new stylized images from the "style" noise.

Text-to-Image Generation

Paper
Add Code

Video-CSR: Complex Video Digest Creation for Visual-Language Models

no code implementations • 8 Oct 2023 • Tingkai Liu, Yunzhe Tao, Haogeng Liu, Qihang Fan, Ding Zhou, Huaibo Huang, Ran He, Hongxia Yang

We present a novel task and human annotated dataset for evaluating the ability for visual-language models to generate captions and summaries for real-world video clips, which we call Video-CSR (Captioning, Summarization and Retrieval).

Retrieval Sentence +1

Paper
Add Code

Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling

no code implementations • 8 Oct 2023 • Haogeng Liu, Qihang Fan, Tingkai Liu, Linjie Yang, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task.

Text Generation Video Summarization

Paper
Add Code

RMT: Retentive Networks Meet Vision Transformers

1 code implementation • 20 Sep 2023 • Qihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu, Ran He

To alleviate these issues, we draw inspiration from the recent Retentive Network (RetNet) in the field of NLP, and propose RMT, a strong vision backbone with explicit spatial prior for general purposes.

Instance Segmentation object-detection +2

200

Paper
Code

Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification

2 code implementations • NeurIPS 2023 • Rui Wang, Peipei Li, Huaibo Huang, Chunshui Cao, Ran He, Zhaofeng He

Consequently, we propose a cross-modal ordinal pairwise loss to refine the CLIP feature space, where texts and images maintain both semantic alignment and ordering alignment.

Age Estimation Classification +2

Paper
Code

Lightweight Vision Transformer with Bidirectional Interaction

1 code implementation • NeurIPS 2023 • Qihang Fan, Huaibo Huang, Xiaoqiang Zhou, Ran He

This paper proposes a Fully Adaptive Self-Attention (FASA) mechanism for vision transformer to model the local and global information as well as the bidirectional interaction between them in context-aware ways.

Paper
Code

Rethinking Local Perception in Lightweight Vision Transformer

1 code implementation • 31 Mar 2023 • Qihang Fan, Huaibo Huang, Jiyang Guan, Ran He

The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information.

Ranked #566 on Image Classification on ImageNet

Image Classification object-detection +2

Paper
Code

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer

no code implementations • 31 Mar 2023 • Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Lei Zhang, Ran He

Unsupervised Domain Adaptation (UDA) can effectively address domain gap issues in real-world image Super-Resolution (SR) by accessing both the source and target data.

Image Super-Resolution Source-Free Domain Adaptation +1

Paper
Add Code

Pluralistic Aging Diffusion Autoencoder

no code implementations • ICCV 2023 • Peipei Li, Rui Wang, Huaibo Huang, Ran He, Zhaofeng He

Face aging is an ill-posed problem because multiple plausible aging patterns may correspond to a given input.

Denoising

Paper
Add Code

MSRA-SR: Image Super-resolution Transformer with Multi-scale Shared Representation Acquisition

no code implementations • ICCV 2023 • Xiaoqiang Zhou, Huaibo Huang, Ran He, Zilei Wang, Jie Hu, Tieniu Tan

In particular, self-attention with cross-scale matching and convolution filters with different kernel sizes are designed to exploit the multi-scale features in images.

Image Super-Resolution

Paper
Add Code

Vision Transformer with Super Token Sampling

1 code implementation • CVPR 2023 • Huaibo Huang, Xiaoqiang Zhou, Jie Cao, Ran He, Tieniu Tan

STA decomposes vanilla global attention into multiplications of a sparse association map and a low-dimensional attention, leading to high efficiency in capturing global dependencies.

Semantic Segmentation Superpixels

109

Paper
Code

Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification

1 code implementation • 11 Oct 2022 • Zi Wang, Huaibo Huang, Aihua Zheng, Chenglong Li, Ran He

To alleviate these two issues, we propose a simple yet effective method with Parallel Augmentation and Dual Enhancement (PADE), which is robust on both occluded and non-occluded data and does not require any auxiliary clues.

Person Re-Identification

Paper
Code

Artistic Style Discovery With Independent Components

1 code implementation • CVPR 2022 • Xin Xie, Yi Li, Huaibo Huang, Haiyan Fu, Wanwan Wang, Yanqing Guo

Style transfer has been well studied in recent years with excellent performance processed.

Style Transfer

Paper
Code

Rethinking Image Cropping: Exploring Diverse Compositions From Global Views

no code implementations • CVPR 2022 • Gengyun Jia, Huaibo Huang, Chaoyou Fu, Ran He

In this paper, we regard image cropping as a set prediction problem.

Image Cropping regression +1

Paper
Add Code

Contrastive Attention Network with Dense Field Estimation for Face Completion

no code implementations • 20 Dec 2021 • Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Gengyun Jia, Zhenhua Chai, Xiaolin Wei

This multi-scale architecture is beneficial for the decoder to utilize discriminative representations learned from encoders into images.

Face Recognition Facial Inpainting

Paper
Add Code

Toward Accurate and Reliable Iris Segmentation Using Uncertainty Learning

no code implementations • 20 Oct 2021 • Jianze Wei, Huaibo Huang, Muyi Sun, Yunlong Wang, Min Ren, Ran He, Zhenan Sun

To make further efforts on accurate and reliable iris segmentation, we propose a bilateral self-attention module and design Bilateral Transformer (BiTrans) with hierarchical architecture by exploring spatial and visual relationships.

Iris Recognition Iris Segmentation +1

Paper
Add Code

Causal Representation Learning for Context-Aware Face Transfer

no code implementations • 4 Oct 2021 • Gege Gao, Huaibo Huang, Chaoyou Fu, Ran He

Human face synthesis involves transferring knowledge about the identity and identity-dependent face shape (IDFS) of a human face to target face images where the context (e. g., facial expressions, head poses, and other background factors) may change dramatically.

counterfactual Counterfactual Inference +4

Paper
Add Code

Universal Face Restoration With Memorized Modulation

no code implementations • 3 Oct 2021 • Jia Li, Huaibo Huang, Xiaofei Jia, Ran He

Blind face restoration (BFR) is a challenging problem because of the uncertainty of the degradation patterns.

Blind Face Restoration

Paper
Add Code

Video Forgery Detection Using Multiple Cues on Fusion of EfficientNet and Swin Transformer

no code implementations • 29 Sep 2021 • Chenyu Liu, Jia Li, Junxian Duan, Huaibo Huang

The first is that capturing the general clue of artifacts is difficult.

Face Swapping Optical Flow Estimation

Paper
Add Code

Information Bottleneck Disentanglement for Identity Swapping

1 code implementation • CVPR 2021 • Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, Ran He

In this work, we propose a novel information disentangling and swapping network, called InfoSwap, to extract the most expressive information for identity representation from a pre-trained face recognition model.

Disentanglement Face Recognition +1

Paper
Code

Memory Oriented Transfer Learning for Semi-Supervised Image Deraining

no code implementations • CVPR 2021 • Huaibo Huang, Aijing Yu, Ran He

To address this issue, we propose a memory-oriented semi-supervised (MOSS) method which enables the network to explore and exploit the properties of rain streaks from both synthetic and real data.

Rain Removal Transfer Learning

Paper
Add Code

Free-Form Image Inpainting via Contrastive Attention Network

no code implementations • 29 Oct 2020 • Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Zhenhua Chai, Xiaolin Wei, Ran He

It is difficult for encoders to capture such powerful representations under this complex situation.

Image Inpainting

Paper
Add Code

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition

1 code implementation • 20 Sep 2020 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He

As a consequence, massive new diverse paired heterogeneous images with the same identity can be generated from noises.

Contrastive Learning Face Recognition +1

114

Paper
Code

Cosmetic-Aware Makeup Cleanser

no code implementations • 20 Apr 2020 • Yi Li, Huaibo Huang, Junchi Yu, Ran He, Tieniu Tan

Face verification aims at determining whether a pair of face images belongs to the same identity.

Face Parsing Face Verification +1

Paper
Add Code

Informative Sample Mining Network for Multi-Domain Image-to-Image Translation

no code implementations • ECCV 2020 • Jie Cao, Huaibo Huang, Yi Li, Ran He, Zhenan Sun

The performance of multi-domain image-to-image translation has been significantly improved by recent progress in deep generative models.

Image-to-Image Translation Informativeness +1

Paper
Add Code

Exploiting Style and Attention in Real-World Super-Resolution

no code implementations • 21 Dec 2019 • Xin Ma, Yi Li, Huaibo Huang, Mandi Luo, Ran He

Real-world image super-resolution (SR) is a challenging image translation problem.

Image Super-Resolution Mutual Information Estimation

Paper
Add Code

LAMP-HQ: A Large-Scale Multi-Pose High-Quality Database and Benchmark for NIR-VIS Face Recognition

no code implementations • 17 Dec 2019 • Aijing Yu, Haoxue Wu, Huaibo Huang, Zhen Lei, Ran He

A spectral conditional attention module is introduced to reduce the domain gap between NIR and VIS data and then improve the performance of NIR-VIS heterogeneous face recognition on various databases including the LAMP-HQ.

Attribute Face Recognition +1

Paper
Add Code

Dual Variational Generation for Low Shot Heterogeneous Face Recognition

no code implementations • NeurIPS 2019 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He

Specifically, we first introduce a dual variational autoencoder to represent a joint distribution of paired heterogeneous images.

Face Recognition Heterogeneous Face Recognition

Paper
Add Code

Biphasic Learning of GANs for High-Resolution Image-to-Image Translation

no code implementations • 14 Apr 2019 • Jie Cao, Huaibo Huang, Yi Li, Jingtuo Liu, Ran He, Zhenan Sun

In this work, we present a novel training framework for GANs, namely biphasic learning, to achieve image-to-image translation in multiple visual domains at $1024^2$ resolution.

Image-to-Image Translation Mutual Information Estimation +2

Paper
Add Code

UVA: A Universal Variational Framework for Continuous Age Analysis

no code implementations • 30 Mar 2019 • Pei-Pei Li, Huaibo Huang, Yibo Hu, Xiang Wu, Ran He, Zhenan Sun

UVA is the first attempt to achieve facial age analysis tasks, including age translation, age generation and age estimation, in a universal framework.

Age Estimation MORPH +1

Paper
Add Code

Dual Variational Generation for Low-Shot Heterogeneous Face Recognition

1 code implementation • 25 Mar 2019 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He

Then, in order to ensure the identity consistency of the generated paired heterogeneous images, we impose a distribution alignment in the latent space and a pairwise identity preserving in the image space.

Ranked #1 on Face Verification on CASIA NIR-VIS 2.0

Face Recognition Heterogeneous Face Recognition

114

Paper
Code

A Survey of Deep Facial Attribute Analysis

no code implementations • 26 Dec 2018 • Xin Zheng, Yanqing Guo, Huaibo Huang, Yi Li, Ran He

Deep learning based facial attribute analysis consists of two basic sub-issues: facial attribute estimation (FAE), which recognizes whether facial attributes are present in given images, and facial attribute manipulation (FAM), which synthesizes or removes desired facial attributes.

Attribute

Paper
Add Code

Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning

no code implementations • 17 Dec 2018 • Hao Zhu, Huaibo Huang, Yi Li, Aihua Zheng, Ran He

Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image.

Talking Face Generation

Paper
Add Code

Disentangled Variational Representation for Heterogeneous Face Recognition

no code implementations • 6 Sep 2018 • Xiang Wu, Huaibo Huang, Vishal M. Patel, Ran He, Zhenan Sun

Visible (VIS) to near infrared (NIR) face matching is a challenging problem due to the significant domain discrepancy between the domains and a lack of sufficient data for training cross-modal matching algorithms.

Ranked #2 on Face Verification on BUAA-VisNir