Search Results for author: Zheng-Jun Zha

Found 127 papers, 50 papers with code

Multi-perspective Memory Enhanced Network for Identifying Key Nodes in Social Networks

no code implementations22 Mar 2024 Qiang Zhang, Jiawei Liu, Fanrui Zhang, Xiaoling Zhu, Zheng-Jun Zha

Existing key node identification methods usually consider node influence only from the propagation structure perspective and have insufficient generalization ability to unknown scenarios.

Blocking Graph Attention

Hierarchical Information Enhancement Network for Cascade Prediction in Social Networks

no code implementations22 Mar 2024 Fanrui Zhang, Jiawei Liu, Qiang Zhang, Xiaoling Zhu, Zheng-Jun Zha

In this work, we propose a novel Hierarchical Information Enhancement Network (HIENet) for cascade prediction.

RelationVLM: Making Large Vision-Language Models Understand Visual Relations

no code implementations19 Mar 2024 Zhipeng Huang, Zhizheng Zhang, Zheng-Jun Zha, Yan Lu, Baining Guo

The development of Large Vision-Language Models (LVLMs) is striving to catch up with the success of Large Language Models (LLMs), yet it faces more challenges to be resolved.

Language Modelling

Event-based Asynchronous HDR Imaging by Temporal Incident Light Modulation

no code implementations14 Mar 2024 Yuliang Wu, Ganchao Tan, Jinze Chen, Wei Zhai, Yang Cao, Zheng-Jun Zha

In this paper, we propose AsynHDR, a Pixel-Asynchronous HDR imaging system, based on key insights into the challenges in HDR imaging and the unique event-generating mechanism of Dynamic Vision Sensors (DVS).

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

no code implementations3 Mar 2024 Hongjian Liu, Qingsong Xie, Zhijie Deng, Chen Chen, Shixiang Tang, Fueyang Fu, Zheng-Jun Zha, Haonan Lu

In contrast to vanilla consistency distillation (CD) which distills the ordinary differential equation solvers-based sampling process of a pretrained teacher model into a student, SCott explores the possibility and validates the efficacy of integrating stochastic differential equation (SDE) solvers into CD to fully unleash the potential of the teacher.

Text-to-Image Generation

LEMON: Learning 3D Human-Object Interaction Relation from 2D Images

no code implementations14 Dec 2023 Yuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Zheng-Jun Zha

Which underexploit certain correlations between the interaction counterparts (human and object), and struggle to address the uncertainty in interactions.

Human-Object Interaction Detection Object +1

CCM: Adding Conditional Controls to Text-to-Image Consistency Models

no code implementations12 Dec 2023 Jie Xiao, Kai Zhu, Han Zhang, Zhiheng Liu, Yujun Shen, Yu Liu, Xueyang Fu, Zheng-Jun Zha

Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality.

Decoupling Degradation and Content Processing for Adverse Weather Image Restoration

no code implementations8 Dec 2023 Xi Wang, Xueyang Fu, Peng-Tao Jiang, Jie Huang, Mi Zhou, Bo Li, Zheng-Jun Zha

The former facilitates channel-dependent degradation removal operation, allowing the network to tailor responses to various adverse weather types; the latter, by integrating Fourier's global properties into channel-independent content features, enhances network capacity for consistent global content reconstruction.

Image Restoration

Revisiting Single Image Reflection Removal In the Wild

1 code implementation29 Nov 2023 Yurui Zhu, Xueyang Fu, Peng-Tao Jiang, Hao Zhang, Qibin Sun, Jinwei Chen, Zheng-Jun Zha, Bo Li

This research focuses on the issue of single-image reflection removal (SIRR) in real-world conditions, examining it from two angles: the collection pipeline of real reflection pairs and the perception of real reflection locations.

Reflection Removal

Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

2 code implementations22 Sep 2023 Wei Zhai, Pingyu Wu, Kai Zhu, Yang Cao, Feng Wu, Zheng-Jun Zha

In addition, our method also achieves state-of-the-art weakly supervised semantic segmentation performance on the PASCAL VOC 2012 and MS COCO 2014 datasets.

Object Weakly-Supervised Object Localization +2

BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

1 code implementation5 Sep 2023 Yuxiang Yang, Yingqi Deng, Jing Zhang, Jiahao Nie, Zheng-Jun Zha

The spatial information indicating objects' spatial adjacency across consecutive frames is crucial for effective object tracking.

3D Single Object Tracking Autonomous Driving +2

Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models

no code implementations ICCV 2023 Kecheng Zheng, Wei Wu, Ruili Feng, Kai Zhu, Jiawei Liu, Deli Zhao, Zheng-Jun Zha, Wei Chen, Yujun Shen

To bring the useful knowledge back into light, we first identify a set of parameters that are important to a given downstream task, then attach a binary mask to each parameter, and finally optimize these masks on the downstream data with the parameters frozen.

Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection

no code implementations28 Jun 2023 Jiawei Liu, Jingyi Xie, Fanrui Zhang, Qiang Zhang, Zheng-Jun Zha

The explosive growth of rumors with text and images on social media platforms has drawn great attention.

DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation

no code implementations21 Jun 2023 Yukun Huang, Jianan Wang, Yukai Shi, Xianbiao Qi, Zheng-Jun Zha, Lei Zhang

Text-to-image diffusion models pre-trained on billions of image-text pairs have recently enabled text-to-3D content creation by optimizing a randomly initialized Neural Radiance Fields (NeRF) with score distillation.

Image Generation Text to 3D

Streaming Video Model

1 code implementation CVPR 2023 Yucheng Zhao, Chong Luo, Chuanxin Tang, Dongdong Chen, Noel Codella, Zheng-Jun Zha

We believe that the concept of streaming video model and the implementation of S-ViT are solid steps towards a unified deep learning architecture for video understanding.

Action Recognition Multiple Object Tracking +1

Spatial-Aware Token for Weakly Supervised Object Localization

1 code implementation ICCV 2023 Pingyu Wu, Wei Zhai, Yang Cao, Jiebo Luo, Zheng-Jun Zha

Specifically, a spatial token is first introduced in the input space to aggregate representations for localization task.

Object Weakly-Supervised Object Localization

Grounding 3D Object Affordance from 2D Interactions in Images

1 code implementation ICCV 2023 Yuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Jiebo Luo, Zheng-Jun Zha

Comprehensive experiments on PIAD demonstrate the reliability of the proposed task and the superiority of our method.

Object

Text-Driven Generative Domain Adaptation with Spectral Consistency Regularization

1 code implementation ICCV 2023 Zhenhuan Liu, Liang Li, Jiayu Xiao, Zheng-Jun Zha, Qingming Huang

The experiments demonstrate the effectiveness of our method to preserve the diversity of source domain and generate high fidelity target images.

Domain Adaptation

Decoupling-and-Aggregating for Image Exposure Correction

no code implementations CVPR 2023 Yang Wang, Long Peng, Liang Li, Yang Cao, Zheng-Jun Zha

To this end, we inject the addition/difference operation into the convolution process and devise a Contrast Aware (CA) unit and a Detail Aware (DA) unit to facilitate the statistical and structural regularities modeling.

Edge-Aware Regional Message Passing Controller for Image Forgery Localization

no code implementations CVPR 2023 Dong Li, Jiaying Zhu, Menglu Wang, Jiawei Liu, Xueyang Fu, Zheng-Jun Zha

In the second step, guided by the learnable edges, a region message passing controller is devised to weaken the message passing between the forged and authentic regions.

Binarization graph construction

Learning Cross-Representation Affinity Consistency for Sparsely Supervised Biomedical Instance Segmentation

1 code implementation ICCV 2023 Xiaoyu Liu, Wei Huang, Zhiwei Xiong, Shenglong Zhou, Yueyi Zhang, Xuejin Chen, Zheng-Jun Zha, Feng Wu

Sparse instance-level supervision has recently been explored to address insufficient annotation in biomedical instance segmentation, which is easier to annotate crowded instances and better preserves instance completeness for 3D volumetric datasets compared to common semi-supervision. In this paper, we propose a sparsely supervised biomedical instance segmentation framework via cross-representation affinity consistency regularization.

Instance Segmentation Pseudo Label +1

Generalized UAV Object Detection via Frequency Domain Disentanglement

no code implementations CVPR 2023 Kunyu Wang, Xueyang Fu, Yukun Huang, Chengzhi Cao, Gege Shi, Zheng-Jun Zha

This loss enables the network to concentrate on extracting domain-invariant spectrum and domain-specific spectrum, so as to achieve better disentangling results.

Disentanglement Object +2

Neural Dependencies Emerging from Learning Massive Categories

no code implementations CVPR 2023 Ruili Feng, Kecheng Zheng, Kai Zhu, Yujun Shen, Jian Zhao, Yukun Huang, Deli Zhao, Jingren Zhou, Michael Jordan, Zheng-Jun Zha

Through investigating the properties of the problem solution, we confirm that neural dependency is guaranteed by a redundant logit covariance matrix, which condition is easily met given massive categories, and that neural dependency is highly sparse, implying that one category correlates to only a few others.

Image Classification

Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

1 code implementation18 Jul 2022 Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Zechao Li, Qi Tian, Qingming Huang

Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects.

Attribute Referring Expression +2

Rank Diminishing in Deep Neural Networks

no code implementations13 Jun 2022 Ruili Feng, Kecheng Zheng, Yukun Huang, Deli Zhao, Michael Jordan, Zheng-Jun Zha

By virtue of our numerical tools, we provide the first empirical analysis of the per-layer behavior of network rank in practical settings, i. e., ResNets, deep MLPs, and Transformers on ImageNet.

Label Noise-Resistant Mean Teaching for Weakly Supervised Fake News Detection

no code implementations10 Jun 2022 Jingyi Xie, Jiawei Liu, Zheng-Jun Zha

LNMT leverages unlabeled news and feedback comments of users to enlarge the amount of training data and facilitates model training by generating refined labels as weak supervision.

Fake News Detection Model Optimization

Automatic Relation-aware Graph Network Proliferation

1 code implementation CVPR 2022 Shaofei Cai, Liang Li, Xinzhe Han, Jiebo Luo, Zheng-Jun Zha, Qingming Huang

However, the currently used graph search space overemphasizes learning node features and neglects mining hierarchical relational information.

Graph Classification Graph Learning +5

Principled Knowledge Extrapolation with GANs

no code implementations21 May 2022 Ruili Feng, Jie Xiao, Kecheng Zheng, Deli Zhao, Jingren Zhou, Qibin Sun, Zheng-Jun Zha

Human can extrapolate well, generalize daily knowledge into unseen scenarios, raise and answer counterfactual questions.

counterfactual

Degradation-agnostic Correspondence from Resolution-asymmetric Stereo

no code implementations CVPR 2022 Xihao Chen, Zhiwei Xiong, Zhen Cheng, Jiayong Peng, Yueyi Zhang, Zheng-Jun Zha

Interestingly, we find that, although a stereo matching network trained with the photometric loss is not optimal, its feature extractor can produce degradation-agnostic and matching-specific features.

Stereo Matching

Unsupervised Coherent Video Cartoonization with Perceptual Motion Consistency

1 code implementation2 Apr 2022 Zhenhuan Liu, Liang Li, Huajie Jiang, Xin Jin, Dandan Tu, Shuhui Wang, Zheng-Jun Zha

Furthermore, we devise the spatio-temporal correlative map as a style-independent, global-aware regularization on the perceptual motion consistency.

Optical Flow Estimation Style Transfer

FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization

no code implementations24 Mar 2022 Kecheng Zheng, Yang Cao, Kai Zhu, Ruijing Zhao, Zheng-Jun Zha

However, its generalization performance to heterogeneous tasks is inferior to other architectures (e. g., CNNs and transformers) due to the extensive retention of domain information.

Domain Generalization

ProgressiveMotionSeg: Mutually Reinforced Framework for Event-Based Motion Segmentation

no code implementations22 Mar 2022 Jinze Chen, Yang Wang, Yang Cao, Feng Wu, Zheng-Jun Zha

Dynamic Vision Sensor (DVS) can asynchronously output the events reflecting apparent motion of objects with microsecond resolution, and shows great application potential in monitoring and other fields.

Denoising Motion Estimation +1

Location-Free Camouflage Generation Network

1 code implementation18 Mar 2022 Yangyang Li, Wei Zhai, Yang Cao, Zheng-Jun Zha

However, these methods struggle in 1) efficiently generating camouflage images using foreground and background with arbitrary structure; 2) camouflaging foreground objects to regions with multiple appearances (e. g. the junction of the vegetation and the mountains), which limit their practical application.

Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment

2 code implementations CVPR 2022 Jiayu Xiao, Liang Li, Chaofei Wang, Zheng-Jun Zha, Qingming Huang

A feasible solution is to start with a GAN well-trained on a large scale source domain and adapt it to the target domain with a few samples, termed as few shot generative model adaption.

Generative Adversarial Network

Debiased Batch Normalization via Gaussian Process for Generalizable Person Re-Identification

no code implementations3 Mar 2022 Jiawei Liu, Zhipeng Huang, Liang Li, Kecheng Zheng, Zheng-Jun Zha

In this paper, we propose a novel Debiased Batch Normalization via Gaussian Process approach (GDNorm) for generalizable person re-identification, which models the feature statistic estimation from BN layers as a dynamically self-refining Gaussian process to alleviate the bias to unseen domain for improving the generalization.

Generalizable Person Re-identification Representation Learning

Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared Person Re-Identification

no code implementations3 Mar 2022 Zhipeng Huang, Jiawei Liu, Liang Li, Kecheng Zheng, Zheng-Jun Zha

RGB-infrared person re-identification is an emerging cross-modality re-identification task, which is very challenging due to significant modality discrepancy between RGB and infrared images.

Person Re-Identification

Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading

no code implementations CVPR 2022 Ganchao Tan, Yang Wang, Han Han, Yang Cao, Feng Wu, Zheng-Jun Zha

To recognize words from the event data, we propose a novel Multi-grained Spatio-Temporal Features Perceived Network (MSTP) to perceive fine-grained spatio-temporal features from microsecond time-resolved event data.

Action Recognition Lip Reading

Bijective Mapping Network for Shadow Removal

2 code implementations CVPR 2022 Yurui Zhu, Jie Huang, Xueyang Fu, Feng Zhao, Qibin Sun, Zheng-Jun Zha

Shadow removal, which aims to restore the background in the shadow regions, is challenging due to the highly ill-posed nature.

Shadow Removal

Calibrated Feature Decomposition for Generalizable Person Re-Identification

1 code implementation27 Nov 2021 Kecheng Zheng, Jiawei Liu, Wei Wu, Liang Li, Zheng-Jun Zha

The calibrated person representation is subtly decomposed into the identity-relevant feature, domain feature, and the remaining entangled one.

Domain Generalization Generalizable Person Re-identification

Edge-featured Graph Neural Architecture Search

no code implementations3 Sep 2021 Shaofei Cai, Liang Li, Xinzhe Han, Zheng-Jun Zha, Qingming Huang

Recently, researchers study neural architecture search (NAS) to reduce the dependence of human expertise and explore better GNN architectures, but they over-emphasize entity features and ignore latent relation information concealed in the edges.

Neural Architecture Search

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

1 code implementation30 Aug 2021 Yucheng Zhao, Guangting Wang, Chuanxin Tang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha

Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.

Multi-Modulation Network for Audio-Visual Event Localization

no code implementations26 Aug 2021 Hao Wang, Zheng-Jun Zha, Liang Li, Xuejin Chen, Jiebo Luo

We propose a novel MultiModulation Network (M2N) to learn the above correlation and leverage it as semantic guidance to modulate the related auditory, visual, and fused features.

audio-visual event localization

Self-Supervised Visual Representations Learning by Contrastive Mask Prediction

no code implementations ICCV 2021 Yucheng Zhao, Guangting Wang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha

In this paper, we propose a novel contrastive mask prediction (CMP) task for visual representation learning and design a mask contrast (MaskCo) framework to implement the idea.

Representation Learning Self-Supervised Learning

Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

1 code implementation27 Jul 2021 Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-Jun Zha, Yonggang Wen, DaCheng Tao

In DQFA, a novel domain query is used to aggregate and align global context from the token sequence of both domains.

Domain Adaptation Object +2

Disentangle Your Dense Object Detector

2 code implementations7 Jul 2021 Zehui Chen, Chenhongyi Yang, Qiaofei Li, Feng Zhao, Zheng-Jun Zha, Feng Wu

Extensive experiments on MS COCO benchmark show that our approach can lead to 2. 0 mAP, 2. 4 mAP and 2. 2 mAP absolute improvements on RetinaNet, FCOS, and ATSS baselines with negligible extra overhead.

Disentanglement Object +2

Structured Multi-Level Interaction Network for Video Moment Localization via Language Query

no code implementations CVPR 2021 Hao Wang, Zheng-Jun Zha, Liang Li, Dong Liu, Jiebo Luo

In particular, for cross-modal interaction, we interact the sentence-level query with the whole moment while interact the word-level query with content and boundary, as in a coarse-to-fine manner.

Sentence

Light Field Super-Resolution With Zero-Shot Learning

no code implementations CVPR 2021 Zhen Cheng, Zhiwei Xiong, Chang Chen, Dong Liu, Zheng-Jun Zha

To fill this gap, we propose a zero-shot learning framework for light field SR, which learns a mapping to super-resolve the reference view with examples extracted solely from the input low-resolution light field itself.

Super-Resolution Zero-Shot Learning

Image De-Raining via Continual Learning

no code implementations CVPR 2021 Man Zhou, Jie Xiao, Yifan Chang, Xueyang Fu, Aiping Liu, Jinshan Pan, Zheng-Jun Zha

The proposed model is capable of achieving superior performance on both inhomogeneous and incremental datasets, and is promising for highly compact systems to gradually learn myriad regularities of the different types of rain streaks.

Continual Learning

Adaptive Domain-Specific Normalization for Generalizable Person Re-Identification

no code implementations7 May 2021 Jiawei Liu, Zhipeng Huang, Kecheng Zheng, Dong Liu, Xiaoyan Sun, Zheng-Jun Zha

It describes unseen target domain as a combination of the known source ones, and explicitly learns domain-specific representation with target distribution to improve the model's generalization by a meta-learning pipeline.

Generalizable Person Re-identification Meta-Learning

Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval

no code implementations29 Mar 2021 Rui Zhao, Kecheng Zheng, Zheng-Jun Zha, Hongtao Xie, Jiebo Luo

The cross-modal memory module is employed to record the instance embeddings of all the datasets for global negative mining.

Retrieval Text Retrieval +1

Rethinking Graph Neural Architecture Search from Message-passing

1 code implementation CVPR 2021 Shaofei Cai, Liang Li, Jincan Deng, Beichen Zhang, Zheng-Jun Zha, Li Su, Qingming Huang

Inspired by the strong searching capability of neural architecture search (NAS) in CNN, this paper proposes Graph Neural Architecture Search (GNAS) with novel-designed search space.

feature selection Neural Architecture Search

Group-aware Label Transfer for Domain Adaptive Person Re-identification

1 code implementation CVPR 2021 Kecheng Zheng, Wu Liu, Lingxiao He, Tao Mei, Jiebo Luo, Zheng-Jun Zha

In this paper, we propose a Group-aware Label Transfer (GLT) algorithm, which enables the online interaction and mutual promotion of pseudo-label prediction and representation learning.

Attribute Clustering +5

Synergy Between Semantic Segmentation and Image Denoising via Alternate Boosting

no code implementations24 Feb 2021 Shunxin Xu, Ke Sun, Dong Liu, Zhiwei Xiong, Zheng-Jun Zha

We observe that not only denoising helps combat the drop of segmentation accuracy due to noise, but also pixel-wise semantic information boosts the capability of denoising.

Image Denoising Segmentation +1

VAE^2: Preventing Posterior Collapse of Variational Video Predictions in the Wild

no code implementations28 Jan 2021 Yizhou Zhou, Chong Luo, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng

We believe that VAE$^2$ is also applicable to other stochastic sequence prediction problems where training data are lack of stochasticity.

Video Prediction

Improving De-Raining Generalization via Neural Reorganization

no code implementations ICCV 2021 Jie Xiao, Man Zhou, Xueyang Fu, Aiping Liu, Zheng-Jun Zha

Equipped with our NR algorithm, the deep model can be trained on a list of synthetic rainy datasets by overcoming catastrophic forgetting, making it a general-version de-raining network.

Knowledge Distillation

Cross-Patch Graph Convolutional Network for Image Denoising

no code implementations ICCV 2021 Yao Li, Xueyang Fu, Zheng-Jun Zha

However, the real noisy images in practical are mostly of high resolution rather than the cropped small patches and the vanilla training strategies ignore the cross-patch contextual dependency in the whole image.

Image Denoising

Learning Dual Priors for JPEG Compression Artifacts Removal

no code implementations ICCV 2021 Xueyang Fu, Xi Wang, Aiping Liu, Junwei Han, Zheng-Jun Zha

Specifically, we design a variational model to formulate the image de-blocking problem and propose two prior terms for the image content and gradient, respectively.

Blocking

Attack-Guided Perceptual Data Generation for Real-World Re-Identification

no code implementations ICCV 2021 Yukun Huang, Xueyang Fu, Zheng-Jun Zha

In unconstrained real-world surveillance scenarios, person re-identification (Re-ID) models usually suffer from different low-level perceptual variations, e. g., cross-resolution and insufficient lighting.

Person Re-Identification Representation Learning

Exploiting Sample Uncertainty for Domain Adaptive Person Re-Identification

1 code implementation16 Dec 2020 Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Zhizheng Zhang, Zheng-Jun Zha

Based on this finding, we propose to exploit the uncertainty (measured by consistency levels) to evaluate the reliability of the pseudo-label of a sample and incorporate the uncertainty to re-weight its contribution within various ReID losses, including the identity (ID) classification loss per sample, the triplet loss, and the contrastive loss.

Clustering Domain Adaptive Person Re-Identification +3

Learning Semantic-aware Normalization for Generative Adversarial Networks

1 code implementation NeurIPS 2020 Heliang Zheng, Jianlong Fu, Yanhong Zeng, Jiebo Luo, Zheng-Jun Zha

Such a model disentangles latent factors according to the semantic of feature channels by channel-/group- wise fusion of latent codes and feature channels.

Image Inpainting Unconditional Image Generation

Hierarchical Granularity Transfer Learning

no code implementations NeurIPS 2020 Shaobo Min, Hongtao Xie, Hantao Yao, Xuran Deng, Zheng-Jun Zha, Yongdong Zhang

In this paper, we introduce a new task, named Hierarchical Granularity Transfer Learning (HGTL), to recognize sub-level categories with basic-level annotations and semantic descriptions for hierarchical categories.

Transfer Learning

Hierarchical Gumbel Attention Network for Text-based Person Search

no code implementations10 Oct 2020 Kecheng Zheng, Wu Liu, Jiawei Liu, Zheng-Jun Zha, Tao Mei

This hard selection strategy is able to fuse the strong-relevant multi-modality features for alleviating the problem of matching redundancy.

Image Retrieval Image-to-Text Retrieval +6

Temporal Attribute-Appearance Learning Network for Video-based Person Re-Identification

no code implementations9 Sep 2020 Jiawei Liu, Xierong Zhu, Zheng-Jun Zha

TALNet simultaneously exploits human attributes and appearance to learn comprehensive and effective pedestrian representations from videos.

Attribute Multi-Task Learning +1

DeepFacePencil: Creating Face Images from Freehand Sketches

1 code implementation31 Aug 2020 Yuhang Li, Xuejin Chen, Binxin Yang, Zihan Chen, Zhihua Cheng, Zheng-Jun Zha

In this paper, we explore the task of generating photo-realistic face images from hand-drawn sketches.

Image-to-Image Translation Translation

Nighttime Dehazing with a Synthetic Benchmark

1 code implementation10 Aug 2020 Jing Zhang, Yang Cao, Zheng-Jun Zha, DaCheng Tao

To address this issue, we propose a novel synthetic method called 3R to simulate nighttime hazy images from daytime clear images, which first reconstructs the scene geometry, then simulates the light rays and object reflectance, and finally renders the haze effects.

Learning to Discretely Compose Reasoning Module Networks for Video Captioning

1 code implementation17 Jul 2020 Ganchao Tan, Daqing Liu, Meng Wang, Zheng-Jun Zha

However, existing visual reasoning methods designed for visual question answering are not appropriate to video captioning, for it requires more complex visual reasoning on videos over both space and time, and dynamic module composition along the generation process.

Question Answering Sentence +3

Memory-Augmented Relation Network for Few-Shot Learning

no code implementations9 May 2020 Jun He, Richang Hong, Xueliang Liu, Mingliang Xu, Zheng-Jun Zha, Meng Wang

Metric-based few-shot learning methods concentrate on learning transferable feature embedding that generalizes well from seen categories to unseen categories under the supervision of limited number of labelled instances.

Few-Shot Learning Metric Learning +2

Self-Supervised Tuning for Few-Shot Segmentation

no code implementations12 Apr 2020 Kai Zhu, Wei Zhai, Zheng-Jun Zha, Yang Cao

Few-shot segmentation aims at assigning a category label to each image pixel with few annotated samples.

Meta-Learning Segmentation

ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection

1 code implementation CVPR 2020 Yuxin Wang, Hongtao Xie, Zheng-Jun Zha, Mengting Xing, Zilong Fu, Yongdong Zhang

Then a novel Local Orthogonal Texture-aware Module (LOTM) models the local texture information of proposal features in two orthogonal directions and represents text region with a set of contour points.

Region Proposal Scene Text Detection +1

Real-world Person Re-Identification via Degradation Invariance Learning

no code implementations CVPR 2020 Yukun Huang, Zheng-Jun Zha, Xueyang Fu, Richang Hong, Liang Li

Person re-identification (Re-ID) in real-world scenarios usually suffers from various degradation factors, e. g., low-resolution, weak illumination, blurring and adverse weather.

Image Restoration Person Re-Identification +2

Co-Saliency Spatio-Temporal Interaction Network for Person Re-Identification in Videos

no code implementations10 Apr 2020 Jiawei Liu, Zheng-Jun Zha, Xierong Zhu, Na Jiang

Person re-identification aims at identifying a certain pedestrian across non-overlapping camera networks.

Person Re-Identification

Stacked Convolutional Deep Encoding Network for Video-Text Retrieval

no code implementations10 Apr 2020 Rui Zhao, Kecheng Zheng, Zheng-Jun Zha

Existing dominant approaches for cross-modal video-text retrieval task are to learn a joint embedding space to measure the cross-modal similarity.

Language Modelling Retrieval +2

State-Relabeling Adversarial Active Learning

1 code implementation CVPR 2020 Beichen Zhang, Liang Li, Shijie Yang, Shuhui Wang, Zheng-Jun Zha, Qingming Huang

In this paper, we propose a state relabeling adversarial active learning model (SRAAL), that leverages both the annotation and the labeled/unlabeled state information for deriving the most informative unlabeled samples.

Active Learning

Spatiotemporal Fusion in 3D CNNs: A Probabilistic View

no code implementations CVPR 2020 Yizhou Zhou, Xiaoyan Sun, Chong Luo, Zheng-Jun Zha, Wen-Jun Zeng

Based on the probability space, we further generate new fusion strategies which achieve the state-of-the-art performance on four well-known action recognition datasets.

Action Recognition In Videos Temporal Action Localization

Iterative Context-Aware Graph Inference for Visual Dialog

1 code implementation CVPR 2020 Dan Guo, Hui Wang, Hanwang Zhang, Zheng-Jun Zha, Meng Wang

Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts.

Graph Attention Graph Embedding +2

Domain-aware Visual Bias Eliminating for Generalized Zero-Shot Learning

1 code implementation CVPR 2020 Shaobo Min, Hantao Yao, Hongtao Xie, Chaoqun Wang, Zheng-Jun Zha, Yongdong Zhang

Recent methods focus on learning a unified semantic-aligned visual representation to transfer knowledge between two domains, while ignoring the effect of semantic-free visual representation in alleviating the biased recognition problem.

Generalized Zero-Shot Learning

Multi-Objective Matrix Normalization for Fine-grained Visual Recognition

1 code implementation30 Mar 2020 Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, Yongdong Zhang

In this paper, we propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity.

Fine-Grained Visual Recognition

Object Relational Graph with Teacher-Recommended Learning for Video Captioning

no code implementations CVPR 2020 Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zheng-Jun Zha

In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy.

Ranked #9 on Video Captioning on VATEX (using extra training data)

Language Modelling Video Captioning

Convolutional Dictionary Pair Learning Network for Image Representation Learning

no code implementations17 Dec 2019 Zhao Zhang, Yulin Sun, Yang Wang, Zheng-Jun Zha, Shuicheng Yan, Meng Wang

To address this issue, we propose a novel generalized end-to-end representation learning architecture, dubbed Convolutional Dictionary Pair Learning Network (CDPL-Net) in this paper, which integrates the learning schemes of the CNN and dictionary pair learning into a unified framework.

Dictionary Learning Representation Learning

Deep Self-representative Concept Factorization Network for Representation Learning

no code implementations13 Dec 2019 Yan Zhang, Zhao Zhang, Zheng Zhang, Mingbo Zhao, Li Zhang, Zheng-Jun Zha, Meng Wang

In this paper, we investigate the unsupervised deep representation learning issue and technically propose a novel framework called Deep Self-representative Concept Factorization Network (DSCF-Net), for clustering deep features.

Clustering Representation Learning

Abstract Reasoning with Distracting Features

1 code implementation NeurIPS 2019 Kecheng Zheng, Zheng-Jun Zha, Wei Wei

Abstraction reasoning is a long-standing challenge in artificial intelligence.

Learning Deep Bilinear Transformation for Fine-grained Image Representation

1 code implementation NeurIPS 2019 Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo

However, the computational cost to learn pairwise interactions between deep feature channels is prohibitively expensive, which restricts this powerful transformation to be used in deep neural networks.

Fine-Grained Image Recognition

LinesToFacePhoto: Face Photo Generation from Lines with Conditional Self-Attention Generative Adversarial Network

no code implementations20 Oct 2019 Yuhang Li, Xuejin Chen, Feng Wu, Zheng-Jun Zha

The large-scale discriminator enforces the completeness of global structures and the small-scale discriminator encourages fine details, thereby enhancing the realism of generated face images.

Generative Adversarial Network

Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding

1 code implementation5 Sep 2019 Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Li Su, Qingming Huang

Weakly supervised referring expression grounding (REG) aims at localizing the referential entity in an image according to linguistic query, where the mapping between the image region (proposal) and the query is unknown in the training stage.

Object Referring Expression +2

Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

1 code implementation ICCV 2019 Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Dechao Meng, Qingming Huang

It builds the correspondence between image region proposal and query in an adaptive manner: adaptive grounding and collaborative reconstruction.

Attribute Referring Expression +1

Adaptive Structure-constrained Robust Latent Low-Rank Coding for Image Recovery

no code implementations21 Aug 2019 Zhao Zhang, Lei Wang, Sheng Li, Yang Wang, Zheng Zhang, Zheng-Jun Zha, Meng Wang

Specifically, AS-LRC performs the latent decomposition of given data into a low-rank reconstruction by a block-diagonal codes matrix, a group sparse locality-adaptive salient feature part and a sparse error part.

Representation Learning

Domain-Specific Embedding Network for Zero-Shot Recognition

1 code implementation12 Aug 2019 Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, Yongdong Zhang

In contrast to previous methods, the DSEN decomposes the domain-shared projection function into one domain-invariant and two domain-specific sub-functions to explore the similarities and differences between two domains.

Zero-Shot Learning

Robust Subspace Discovery by Block-diagonal Adaptive Locality-constrained Representation

no code implementations4 Aug 2019 Zhao Zhang, Jiahuan Ren, Sheng Li, Richang Hong, Zheng-Jun Zha, Meng Wang

Leveraging on the Frobenius-norm based latent low-rank representation model, rBDLR jointly learns the coding coefficients and salient features, and improves the results by enhancing the robustness to outliers and errors in given data, preserving local information of salient features adaptively and ensuring the block-diagonal structures of the coefficients.

Representation Learning

Structure-Aware Residual Pyramid Network for Monocular Depth Estimation

1 code implementation13 Jul 2019 Xiaotian Chen, Xuejin Chen, Zheng-Jun Zha

We propose a Residual Pyramid Decoder (RPD) which expresses global scene structure in upper levels to represent layouts, and local structure in lower levels to present shape details.

Depth Prediction Monocular Depth Estimation +1

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text

no code implementations ACL 2019 Jianxing Yu, Zheng-Jun Zha, Jian Yin

This paper focuses on the topic of inferential machine comprehension, which aims to fully understand the meanings of given text to answer generic questions, especially the ones needed reasoning skills.

Reading Comprehension

Posterior-Guided Neural Architecture Search

1 code implementation23 Jun 2019 Yizhou Zhou, Xiaoyan Sun, Chong Luo, Zheng-Jun Zha, Wen-Jun Zeng

Accordingly, a hybrid network representation is presented which enables us to leverage the Variational Dropout so that the approximation of the posterior distribution becomes fully gradient-based and highly efficient.

Image Classification Neural Architecture Search

Joint Visual Grounding with Language Scene Graphs

no code implementations9 Jun 2019 Daqing Liu, Hanwang Zhang, Zheng-Jun Zha, Meng Wang, Qianru Sun

In this paper, we alleviate the missing-annotation problem and enable the joint reasoning by leveraging the language scene graph which covers both labeled referent and unlabeled contexts (other objects, attributes, and relationships).

Referring Expression Visual Grounding

Context-Aware Visual Policy Network for Fine-Grained Image Captioning

1 code implementation6 Jun 2019 Zheng-Jun Zha, Daqing Liu, Hanwang Zhang, Yongdong Zhang, Feng Wu

With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i. e., the task of image captioning.

Image Captioning Image Paragraph Captioning +2

One-Shot Texture Retrieval with Global Context Metric

no code implementations16 May 2019 Kai Zhu, Wei Zhai, Zheng-Jun Zha, Yang Cao

In this paper, we tackle one-shot texture retrieval: given an example of a new reference texture, detect and segment all the pixels of the same texture category within an arbitrary image.

Relation Relation Network +2

Multimodal Semantic Attention Network for Video Captioning

no code implementations8 May 2019 Liang Sun, Bing Li, Chunfeng Yuan, Zheng-Jun Zha, Weiming Hu

Inspired by the fact that different modalities in videos carry complementary information, we propose a Multimodal Semantic Attention Network(MSAN), which is a new encoder-decoder framework incorporating multimodal semantic attributes for video captioning.

Attribute General Classification +2

Camera Lens Super-Resolution

1 code implementation CVPR 2019 Chang Chen, Zhiwei Xiong, Xinmei Tian, Zheng-Jun Zha, Feng Wu

Existing methods for single image super-resolution (SR) are typically evaluated with synthetic degradation models such as bicubic or Gaussian downsampling.

Image Super-Resolution

Making History Matter: History-Advantage Sequence Training for Visual Dialog

no code implementations ICCV 2019 Tianhao Yang, Zheng-Jun Zha, Hanwang Zhang

We study the multi-round response generation in visual dialog, where a response is generated according to a visually grounded conversational history.

Answer Generation Response Generation +2

Learning to Assemble Neural Module Tree Networks for Visual Grounding

no code implementations ICCV 2019 Daqing Liu, Hanwang Zhang, Feng Wu, Zheng-Jun Zha

In particular, we develop a novel modular network called Neural Module Tree network (NMTree) that regularizes the visual grounding along the dependency parsing tree of the sentence, where each node is a neural module that calculates visual attention according to its linguistic feature, and the grounding score is accumulated in a bottom-up direction where as needed.

Dependency Parsing Natural Language Visual Grounding +5

CA3Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification

no code implementations19 Nov 2018 Jiawei Liu, Zheng-Jun Zha, Hongtao Xie, Zhiwei Xiong, Yongdong Zhang

An appearance network is developed to learn appearance features from the full body, horizontal and vertical body parts of pedestrians with spatial dependencies among body parts.

Attribute Multi-Task Learning +1

Towards Human-Level License Plate Recognition

no code implementations ECCV 2018 Jiafan Zhuang, Saihui Hou, Zilei Wang, Zheng-Jun Zha

License plate recognition (LPR) is a fundamental component of various intelligent transport systems, which is always expected to be accurate and efficient enough.

License Plate Recognition Semantic Segmentation

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

1 code implementation16 Aug 2018 Daqing Liu, Zheng-Jun Zha, Hanwang Zhang, Yongdong Zhang, Feng Wu

To fill the gap, we propose a Context-Aware Visual Policy network (CAVP) for sequence-level image captioning.

Image Captioning Reinforcement Learning (RL)

A Two-Stream Mutual Attention Network for Semi-supervised Biomedical Segmentation with Noisy Labels

no code implementations31 Jul 2018 Shaobo Min, Xuejin Chen, Zheng-Jun Zha, Feng Wu, Yongdong Zhang

\begin{abstract} Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation.

MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition

no code implementations CVPR 2018 Yizhou Zhou, Xiaoyan Sun, Zheng-Jun Zha, Wen-Jun Zeng

Recent attempts use 3D convolutional neural networks (CNNs) to explore spatio-temporal information for human action recognition.

Action Recognition Temporal Action Localization

Frank-Wolfe Network: An Interpretable Deep Structure for Non-Sparse Coding

1 code implementation28 Feb 2018 Dong Liu, Ke Sun, Zhangyang Wang, Runsheng Liu, Zheng-Jun Zha

We propose an interpretable deep structure namely Frank-Wolfe Network (F-W Net), whose architecture is inspired by unrolling and truncating the Frank-Wolfe algorithm for solving an $L_p$-norm constrained problem with $p\geq 1$.

Handwritten Digit Recognition Image Denoising +2

Learning Compact Appearance Representation for Video-based Person Re-Identification

no code implementations21 Feb 2017 Wei Zhang, Shengnan Hu, Kan Liu, Zheng-Jun Zha

This paper presents a novel approach for video-based person re-identification using multiple Convolutional Neural Networks (CNNs).

Video-Based Person Re-Identification

Comparative Deep Learning of Hybrid Representations for Image Recommendations

no code implementations CVPR 2016 Chenyi Lei, Dong Liu, Weiping Li, Zheng-Jun Zha, Houqiang Li

In many image-related tasks, learning expressive and discriminative representations of images is essential, and deep learning has been studied for automating the learning of such representations.

Cannot find the paper you are looking for? You can Submit a new open access paper.