Search Results for author: Pengfei Zhu

Found 47 papers, 28 papers with code

CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective

no code implementations22 Apr 2024 Wencheng Zhu, Xin Zhou, Pengfei Zhu, Yu Wang, QinGhua Hu

Note that constraints on intra-sample similarities and inter-sample dissimilarities can be efficiently and effectively reformulated into a contrastive learning framework with newly designed positive and negative pairs.

Contrastive Learning Image Classification +3

AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

1 code implementation13 Apr 2024 Yuwei Tang, Zhenyi Lin, Qilong Wang, Pengfei Zhu, QinGhua Hu

To this end, we disassemble three key components involved in computation of logit bias (i. e., logit features, logit predictor, and logit fusion) and empirically analyze the effect on performance of few-shot classification.

Few-Shot Learning

Task-Customized Mixture of Adapters for General Image Fusion

1 code implementation19 Mar 2024 Pengfei Zhu, Yang Sun, Bing Cao, QinGhua Hu

These adapters are shared across different tasks and constrained by mutual information regularization, ensuring compatibility with different tasks while complementarity for multi-source images.

Every Node is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

1 code implementation12 Jan 2024 Pengfei Zhu, Qian Wang, Yu Wang, Jialu Li, QinGhua Hu

In this paper, we propose to dynamically learn the weights of SSL tasks for different nodes and fuse the embeddings learned from different SSL tasks to boost performance.

Clustering Graph Clustering +1

Exploring Diverse Representations for Open Set Recognition

1 code implementation12 Jan 2024 Yu Wang, Junxian Mu, Pengfei Zhu, QinGhua Hu

We show that the differences in attention maps can lead to diverse representations so that the fused representations can well handle the open space.

Open Set Learning

Uncovering the human motion pattern: Pattern Memory-based Diffusion Model for Trajectory Prediction

no code implementations5 Jan 2024 Yuxin Yang, Pengfei Zhu, Mengshi Qi, Huadong Ma

To uncover latent motion patterns in human behavior, we introduce a novel memory-based method, named Motion Pattern Priors Memory Network.

Autonomous Driving Retrieval +1

Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning

1 code implementation27 Dec 2023 Yan Fan, Yu Wang, Pengfei Zhu, QinGhua Hu

In this work, we focus on semi-supervised continual learning (SSCL), where the model progressively learns from partially labeled data with unknown categories.

Continual Learning graph construction +1

Bi-directional Adapter for Multi-modal Tracking

1 code implementation17 Dec 2023 Bing Cao, Junliang Guo, Pengfei Zhu, QinGhua Hu

To handle this problem, we propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter, cross-prompting multiple modalities mutually.

Object Tracking Rgb-T Tracking

Tuning Pre-trained Model via Moment Probing

1 code implementation ICCV 2023 Mingze Gao, Qilong Wang, Zhenyi Lin, Pengfei Zhu, QinGhua Hu, Jingbo Zhou

Distinguished from LP which builds a linear classification head based on the mean of final features (e. g., word tokens for ViT) or classification tokens, our MP performs a linear classifier on feature distribution, which provides the stronger representation ability by exploiting richer statistical information inherent in features.

Image Classification

Cross-Drone Transformer Network for Robust Single Object Tracking

1 code implementation IEEE Transactions on Circuits and Systems for Video Technology 2023 Guanlin Chen, Pengfei Zhu, Bing Cao, Xing Wang, QinGhua Hu

During the tracking process, a cross-drone mapping mechanism is proposed by using the surrounding information of the drone with promising tracking status as reference, assisting drones that lost targets to re-calibrate, which implements real-time cross-drone information interaction.

Object Visual Object Tracking +1

OpenMix+: Revisiting Data Augmentation for Open Set Recognition

1 code implementation IEEE Transactions on Circuits and Systems for Video Technology 2023 Guosong Jiang, Pengfei Zhu, Yu Wang, QinGhua Hu

In this paper, we point out that balancing between structural risk and open space risk is crucial for open set recognition, and re-formalize it as open set structural risk.

Data Augmentation Open Set Learning

ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models

no code implementations9 Feb 2023 Pengfei Zhu, Chao Pang, Yekun Chai, Lei LI, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu

In response to this lacuna, this paper introduces a pioneering contribution in the form of a text-to-waveform music generation model, underpinned by the utilization of diffusion models.

Music Generation Text-to-Music Generation

Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion

1 code implementation ICCV 2023 Yiming Sun, Bing Cao, Pengfei Zhu, QinGhua Hu

The MoLE performs specialized learning of multi-modal local features, prompting the fused images to retain the local information in a sample-adaptive manner, while the MoGE focuses on the global information that complements the fused image with overall texture detail and contrast.

Infrared And Visible Image Fusion

HS-Diffusion: Semantic-Mixing Diffusion for Head Swapping

1 code implementation13 Dec 2022 Qinghe Wang, Lijie Liu, Miao Hua, Pengfei Zhu, WangMeng Zuo, QinGhua Hu, Huchuan Lu, Bing Cao

We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator, achieving a coarse-grained head swapping.

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

2 code implementations7 Nov 2022 Xiaoran Fan, Chao Pang, Tian Yuan, He Bai, Renjie Zheng, Pengfei Zhu, Shuohuan Wang, Junkun Chen, Zeyu Chen, Liang Huang, Yu Sun, Hua Wu

In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing.

Representation Learning Speech Synthesis +2

DetFusion: A Detection-driven Infrared and Visible Image Fusion Network

1 code implementation ACMMM 2022 Yiming Sun, Bing Cao, Pengfei Zhu, QinGhua Hu

We cascade the image fusion network with the detection networks of both modalities and use the detection loss of the fused images to provide guidance on task-related information for the optimization of the image fusion network.

Infrared And Visible Image Fusion Object +2

Tell Me the Evidence? Dual Visual-Linguistic Interaction for Answer Grounding

no code implementations21 Jun 2022 Junwen Pan, Guanlin Chen, Yi Liu, Jiexiang Wang, Cheng Bian, Pengfei Zhu, Zhicheng Zhang

Answer grounding aims to reveal the visual evidence for visual question answering (VQA), which entails highlighting relevant positions in the image when answering questions about images.

Question Answering Visual Grounding +1

Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and Semi-Supervised Semantic Segmentation

1 code implementation19 Mar 2022 Junwen Pan, Pengfei Zhu, Kaihua Zhang, Bing Cao, Yu Wang, Dingwen Zhang, Junwei Han, QinGhua Hu

Semantic segmentation with limited annotations, such as weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS), is a challenging task that has attracted much attention recently.

Pseudo Label Segmentation +3

Label-efficient Hybrid-supervised Learning for Medical Image Segmentation

no code implementations10 Mar 2022 Junwen Pan, Qi Bi, Yanzhan Yang, Pengfei Zhu, Cheng Bian

Due to the lack of expertise for medical image annotation, the investigation of label-efficient methodology for medical image segmentation becomes a heated topic.

Image Segmentation Medical Image Segmentation +2

Learning Dynamic Compact Memory Embedding for Deformable Visual Object Tracking

no code implementations23 Nov 2021 Pengfei Zhu, Hongtao Yu, Kaihua Zhang, Yu Wang, Shuai Zhao, Lei Wang, Tianzhu Zhang, QinGhua Hu

To address this issue, segmentation-based trackers have been proposed that employ per-pixel matching to improve the tracking performance of deformable objects effectively.

Segmentation Visual Object Tracking +1

Unsupervised Open-Domain Question Answering

no code implementations31 Aug 2021 Pengfei Zhu, Xiaoguang Li, Jian Li, Hai Zhao

Open-domain Question Answering (ODQA) has achieved significant results in terms of supervised learning manner.

Machine Reading Comprehension Open-Domain Question Answering

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

1 code implementation CVPR 2021 Longyin Wen, Dawei Du, Pengfei Zhu, QinGhua Hu, Qilong Wang, Liefeng Bo, Siwei Lyu

To promote the developments of object detection, tracking and counting algorithms in drone-captured videos, we construct a benchmark with a new drone-captured largescale dataset, named as DroneCrowd, formed by 112 video clips with 33, 600 HD frames in various scenarios.

object-detection Object Detection +1

SPL-MLL: Selecting Predictable Landmarks for Multi-Label Learning

no code implementations ECCV 2020 Junbing Li, Changqing Zhang, Pengfei Zhu, Baoyuan Wu, Lei Chen, QinGhua Hu

Although significant progress achieved, multi-label classification is still challenging due to the complexity of correlations among different labels.

General Classification Multi-Label Classification +1

Multi-Drone based Single Object Tracking with Agent Sharing Network

1 code implementation16 Mar 2020 Pengfei Zhu, Jiayu Zheng, Dawei Du, Longyin Wen, Yiming Sun, QinGhua Hu

Moreover, an agent sharing network (ASNet) is proposed by self-supervised template sharing and view-aware fusion of the target from multiple drones, which can improve the tracking accuracy significantly compared with single drone tracking.

Object Tracking

Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning

2 code implementations5 Mar 2020 Yiming Sun, Bing Cao, Pengfei Zhu, QinGhua Hu

To address this dilemma, we further propose an uncertainty-aware cross-modality vehicle detection (UA-CMDet) framework to extract complementary information from cross-modal images, which can significantly improve the detection performance in low light conditions.

Management Object Counting +1

DUMA: Reading Comprehension with Transposition Thinking

3 code implementations26 Jan 2020 Pengfei Zhu, Hai Zhao, Xiaoguang Li

Multi-choice Machine Reading Comprehension (MRC) requires model to decide the correct answer from a set of answer options when given a passage and a question.

Language Modelling Machine Reading Comprehension +1

Detection and Tracking Meet Drones Challenge

2 code implementations16 Jan 2020 Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, QinGhua Hu, Haibin Ling

We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i. e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking.

Multi-Object Tracking Object +2

Dual Multi-head Co-attention for Multi-choice Reading Comprehension

no code implementations1 Jan 2020 Pengfei Zhu, Hai Zhao, Xiaoguang Li

Multi-choice Machine Reading Comprehension (MRC) requires model to decide the correct answer from a set of answer options when given a passage and a question.

Language Modelling Machine Reading Comprehension +1

Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

1 code implementation4 Dec 2019 Longyin Wen, Dawei Du, Pengfei Zhu, QinGhua Hu, Qilong Wang, Liefeng Bo, Siwei Lyu

This paper proposes a space-time multi-scale attention network (STANet) to solve density map estimation, localization and tracking in dense crowds of video clips captured by drones with arbitrary crowd density, perspective, and flight altitude.

Crowd Counting

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

12 code implementations CVPR 2020 Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, WangMeng Zuo, QinGhua Hu

By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity.

Dimensionality Reduction Image Classification +4

Multi-view Deep Subspace Clustering Networks

2 code implementations6 Aug 2019 Pengfei Zhu, Xinjie Yao, Yu Wang, Binyuan Hui, Dawei Du, QinGhua Hu

Dnet learns view-specific self-representation matrices, whereas Unet learns a common self-representation matrix for all views.

Clustering Model Selection +1

Progressive Image Deraining Networks: A Better and Simpler Baseline

4 code implementations CVPR 2019 Dongwei Ren, WangMeng Zuo, QinGhua Hu, Pengfei Zhu, Deyu Meng

To handle this issue, this paper provides a better and simpler baseline deraining network by considering network architecture, input and output, and loss functions.

Image Super-Resolution Single Image Deraining +1

Modeling Multi-turn Conversation with Deep Utterance Aggregation

1 code implementation COLING 2018 Zhuosheng Zhang, Jiangtong Li, Pengfei Zhu, Hai Zhao, Gongshen Liu

In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation.

Conversational Response Selection Retrieval

Vision Meets Drones: A Challenge

no code implementations20 Apr 2018 Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Ling, QinGhua Hu

In this paper we present a large-scale visual object detection and tracking benchmark, named VisDrone2018, aiming at advancing visual understanding tasks on the drone platform.

Multi-Object Tracking Object +2

On Improving Deep Reinforcement Learning for POMDPs

no code implementations17 Apr 2018 Pengfei Zhu, Xin Li, Pascal Poupart, Guanghui Miao

Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e. g., computer Go.

Atari Games Decision Making +4

Latent Multi-View Subspace Clustering

no code implementations CVPR 2017 Changqing Zhang, QinGhua Hu, Huazhu Fu, Pengfei Zhu, Xiaochun Cao

In this paper, we propose a novel Latent Multi-view Subspace Clustering (LMSC) method, which clusters data points with latent representation and simultaneously explores underlying complementary information from multiple views.

Clustering Multi-view Subspace Clustering

On Improving Deep Reinforcement Learning for POMDPs

1 code implementation26 Apr 2017 Pengfei Zhu, Xin Li, Pascal Poupart, Guanghui Miao

Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e. g., computer Go.

Atari Games Decision Making +4

Image Set based Collaborative Representation for Face Recognition

no code implementations30 Aug 2013 Pengfei Zhu, WangMeng Zuo, Lei Zhang, Simon C. K. Shiu, David Zhang

One key issue of ISFR is how to effectively and efficiently represent the query face image set by using the gallery face image sets.

Face Recognition General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.