Search Results for author: Xinfeng Zhang

Found 27 papers, 14 papers with code

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations

2 code implementations6 May 2023 Yufeng Huang, Jiji Tang, Zhuo Chen, Rongsheng Zhang, Xinfeng Zhang, WeiJie Chen, Zeng Zhao, Zhou Zhao, Tangjie Lv, Zhipeng Hu, Wen Zhang

In this paper, we present an end-to-end framework Structure-CLIP, which integrates Scene Graph Knowledge (SGK) to enhance multi-modal structured representations.

Image-text matching Text Matching

Scene Matters: Model-based Deep Video Compression

no code implementations ICCV 2023 Lv Tang, Xinfeng Zhang, Gai Zhang, Xiaoqi Ma

Video compression has always been a popular research area, where many traditional and deep video compression methods have been proposed.

Video Compression

Cross Modal Compression: Towards Human-comprehensible Semantic Compression

no code implementations6 Sep 2022 Jiguo Li, Chuanmin Jia, Xinfeng Zhang, Siwei Ma, Wen Gao

With the recent advances in cross modal translation and generation, in this paper, we propose the cross modal compression~(CMC), a semantic compression framework for visual data, to transform the high redundant visual data~(such as image, video, etc.)

Feature Compression Video Compression

STIP: A SpatioTemporal Information-Preserving and Perception-Augmented Model for High-Resolution Video Prediction

1 code implementation9 Jun 2022 Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Wen Gao

To solve the information loss problem, the proposed model aims to preserve the spatiotemporal information for videos during the feature extraction and the state transitions, respectively.

Video Prediction

STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond

no code implementations20 Apr 2022 Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Wen Gao

In this paper, we propose a SpatioTemporal-Aware Unit (STAU) for video prediction and beyond by exploring the significant spatiotemporal correlations in videos.

Action Recognition object-detection +2

MAU: A Motion-Aware Unit for Video Prediction and Beyond

1 code implementation NeurIPS 2021 Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Yan Ye, Xiang Xinguang, Wen Gao

The attention module aims to learn an attention map based on the correlations between the current spatial state and the historical spatial states.

Action Recognition Video Prediction

Improving Robustness and Accuracy via Relative Information Encoding in 3D Human Pose Estimation

1 code implementation29 Jul 2021 Wenkang Shan, Haopeng Lu, Shanshe Wang, Xinfeng Zhang, Wen Gao

To alleviate these two problems, we propose a relative information encoding method that yields positional and temporal enhanced representations.

Monocular 3D Human Pose Estimation

Towards Fine-grained Human Pose Transfer with Detail Replenishing Network

no code implementations26 May 2020 Lingbo Yang, Pan Wang, Chang Liu, Zhanning Gao, Peiran Ren, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Xian-Sheng Hua, Wen Gao

Human pose transfer (HPT) is an emerging research topic with huge potential in fashion design, media production, online advertising and virtual reality.

Pose Transfer Retrieval

Towards Analysis-friendly Face Representation with Scalable Feature and Texture Compression

no code implementations21 Apr 2020 Shurun Wang, Shiqi Wang, Wenhan Yang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Wen Gao

In particular, we study the feature and texture compression in a scalable coding framework, where the base layer serves as the deep learning feature and enhancement layer targets to perfectly reconstruct the texture.

Image Compression

Universal Adversarial Perturbations Generative Network for Speaker Recognition

1 code implementation7 Apr 2020 Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao

Attacking deep learning based biometric systems has drawn more and more attention with the wide deployment of fingerprint/face/speaker recognition systems, given the fact that the neural networks are vulnerable to the adversarial examples, which have been intentionally perturbed to remain almost imperceptible for human.

Speaker Recognition

Direct Speech-to-image Translation

1 code implementation7 Apr 2020 Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao

In this paper, we attempt to translate the speech signals into the image signals without the transcription stage.

Multimedia Sound Audio and Speech Processing

Learning to fool the speaker recognition

1 code implementation7 Apr 2020 Jiguo Li, Xinfeng Zhang, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao

Due to the widespread deployment of fingerprint/face/speaker recognition systems, attacking deep learning based biometric systems has drawn more and more attention.

Audio and Speech Processing Cryptography and Security Sound

A Modified Perturbed Sampling Method for Local Interpretable Model-agnostic Explanation

no code implementations18 Feb 2020 Sheng Shi, Xinfeng Zhang, Wei Fan

Explainability is a gateway between Artificial Intelligence and society as the current popular deep learning models are generally weak in explaining the reasoning process and prediction results.

Image Classification

Explaining the Predictions of Any Image Classifier via Decision Trees

no code implementations4 Nov 2019 Sheng Shi, Xinfeng Zhang, Wei Fan

Despite outstanding contribution to the significant progress of Artificial Intelligence (AI), deep learning models remain mostly black boxes, which are extremely weak in explainability of the reasoning process and prediction results.

PgNN: Physics-guided Neural Network for Fourier Ptychographic Microscopy

no code implementations19 Sep 2019 Yongbing Zhang, Yangzhe Liu, Xiu Li, Shaowei Jiang, Krishna Dixit, Xinfeng Zhang, Xiangyang Ji

Since the optimal parameters of the PgNN can be derived by minimizing the difference between the model-generated images and real captured angle-varied images corresponding to the same scene, the proposed PgNN can get rid of the problem of massive training data as in traditional supervised methods.

Image and Video Compression with Neural Networks: A Review

no code implementations7 Apr 2019 Siwei Ma, Xinfeng Zhang, Chuanmin Jia, Zhenghui Zhao, Shiqi Wang, Shanshe Wang

Deep convolution neural network (CNN) which makes the neural network resurge in recent years and has achieved great success in both artificial intelligent and signal processing fields, also provides a novel and promising solution for image and video compression.

Video Compression

Scalable Facial Image Compression with Deep Feature Reconstruction

no code implementations14 Mar 2019 Shurun Wang, Shiqi Wang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Wen Gao

In this paper, we propose a scalable image compression scheme, including the base layer for feature representation and enhancement layer for texture representation.

Image Compression

Anomaly Detection and Localization in Crowded Scenes by Motion-field Shape Description and Similarity-based Statistical Learning

no code implementations27 May 2018 Xinfeng Zhang, Su Yang, Xinjian Zhang, Weishan Zhang, Jiulong Zhang

In crowded scenes, detection and localization of abnormal behaviors is challenging in that high-density people make object segmentation and tracking extremely difficult.

Anomaly Detection Clustering +1

Spatial-Temporal Residue Network Based In-Loop Filter for Video Coding

no code implementations25 Sep 2017 Chuanmin Jia, Shiqi Wang, Xinfeng Zhang, Shanshe Wang, Siwei Ma

Deep learning has demonstrated tremendous break through in the area of image/video processing.

Multimedia

Cannot find the paper you are looking for? You can Submit a new open access paper.