Search Results for author: Jingkuan Song

Found 98 papers, 49 papers with code

EchoReel: Enhancing Action Generation of Existing Video Diffusion Models

1 code implementation • 18 Mar 2024 • Jianzhi Liu, Junchen Zhu, Lianli Gao, Jingkuan Song

Recent large-scale video datasets have facilitated the generation of diverse open-domain videos of Video Diffusion Models (VDMs).

Action Generation

Paper
Code

CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model

1 code implementation • 13 Mar 2024 • Cheng Chen, Junchen Zhu, Xu Luo, HengTao Shen, Lianli Gao, Jingkuan Song

To this end, we introduce MoELoRA to MLLMs which is effective to retain the previous instruction alignment.

General Knowledge Instruction Following +2

Paper
Code

Training-Free Semantic Video Composition via Pre-trained Diffusion Model

no code implementations • 17 Jan 2024 • Jiaqi Guo, Sitong Su, Junchen Zhu, Lianli Gao, Jingkuan Song

Therefore, we propose a training-free pipeline employing a pre-trained diffusion model imbued with semantic prior knowledge, which can process composite videos with broader semantic disparities.

Paper
Add Code

Context-based Transfer and Efficient Iterative Learning for Unbiased Scene Graph Generation

no code implementations • 29 Dec 2023 • Qishen Chen, Xinyu Lyu, Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song

Thus, we introduce a plug-and-play method named CITrans, which iteratively trains SGG models with progressively enhanced data.

Graph Generation Unbiased Scene Graph Generation

Paper
Add Code

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

1 code implementation • 19 Dec 2023 • Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen

Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP.

Few-Shot Learning Retrieval +2

Paper
Code

Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control

no code implementations • 6 Dec 2023 • Sitong Su, Litao Guo, Lianli Gao, Heng Tao Shen, Jingkuan Song

Story Visualization aims to generate images aligned with story prompts, reflecting the coherence of storybooks through visual consistency among characters and scenes. Whereas current approaches exclusively concentrate on characters and neglect the visual consistency among contextually correlated scenes, resulting in independent character images without inter-image coherence. To tackle this issue, we propose a new presentation form for Story Visualization called Storyboard, inspired by film-making, as illustrated in Fig. 1. Specifically, a Storyboard unfolds a story into visual representations scene by scene.

Story Visualization

Paper
Add Code

F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis

no code implementations • 6 Dec 2023 • Sitong Su, Jianzhi Liu, Lianli Gao, Jingkuan Song

Recently Text-to-Video (T2V) synthesis has undergone a breakthrough by training transformers or diffusion models on large-scale datasets.

Paper
Add Code

Towards Redundancy-Free Sub-networks in Continual Learning

1 code implementation • 1 Dec 2023 • Cheng Chen, Jingkuan Song, Lianli Gao, Heng Tao Shen

Catastrophic Forgetting (CF) is a prominent issue in continual learning.

Ranked #1 on Continual Learning on CIFAR-100 ResNet-18 - 300 Epochs

Continual Learning

Paper
Code

MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation

no code implementations • 28 Nov 2023 • Sitong Su, Litao Guo, Lianli Gao, HengTao Shen, Jingkuan Song

To tackle the two issues, we propose a prompt-adaptive and disentangled motion control strategy coined as MotionZero, which derives motion priors from prompts of different objects by Large-Language-Models and accordingly applies motion control of different objects to corresponding regions in disentanglement.

Disentanglement Text-to-Video Generation +2

Paper
Add Code

BatchNorm-based Weakly Supervised Video Anomaly Detection

1 code implementation • 26 Nov 2023 • Yixuan Zhou, Yi Qu, Xing Xu, Fumin Shen, Jingkuan Song, HengTao Shen

In the proposed BN-WVAD, we leverage the Divergence of Feature from Mean vector (DFM) of BatchNorm as a reliable abnormality criterion to discern potential abnormal snippets in abnormal videos.

Ranked #1 on Anomaly Detection In Surveillance Videos on UCF-Crime

Anomaly Detection In Surveillance Videos Video Anomaly Detection

Paper
Code

CUCL: Codebook for Unsupervised Continual Learning

1 code implementation • 25 Nov 2023 • Chen Cheng, Jingkuan Song, Xiaosu Zhu, Junchen Zhu, Lianli Gao, HengTao Shen

To address this issue, after analyzing the phenomenon and identifying the lack of diversity as a vital factor, we propose a method named Codebook for Unsupervised Continual Learning (CUCL) which promotes the model to learn discriminative features to complete the class boundary.

Continual Learning Quantization

Paper
Code

Continual Referring Expression Comprehension via Dual Modular Memorization

1 code implementation • 25 Nov 2023 • Heng Tao Shen, Cheng Chen, Peng Wang, Lianli Gao, Meng Wang, Jingkuan Song

In this paper, we propose Continual Referring Expression Comprehension (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks.

Memorization Referring Expression +1

Paper
Code

Class Gradient Projection For Continual Learning

1 code implementation • 25 Nov 2023 • Cheng Chen, Ji Zhang, Jingkuan Song, Lianli Gao

Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL).

Continual Learning Contrastive Learning

Paper
Code

Towards a Unified Transformer-based Framework for Scene Graph Generation and Human-object Interaction Detection

no code implementations • 3 Nov 2023 • Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li

In light of this, we introduce SG2HOI+, a unified one-step model based on the Transformer architecture.

Graph Generation Human-Object Interaction Detection +3

Paper
Add Code

CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing

no code implementations • 29 Oct 2023 • Rukai Wei, Yu Liu, Jingkuan Song, Heng Cui, Yanzhao Xie, Ke Zhou

Compressing videos into binary codes can improve retrieval speed and reduce storage overhead.

Contrastive Learning Retrieval +1

Paper
Add Code

X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention

1 code implementation • 12 Oct 2023 • Yixuan Zhou, Xuanhan Wang, Xing Xu, Lei Zhao, Jingkuan Song

Inspired by this observation, we introduce a lightweight and powerful alternative, Spatially Unidimensional Self-Attention (SUSA), to the pointwise (1x1) convolution that is the main computational bottleneck in the depthwise separable 3c3 convolution.

Pose Estimation

Paper
Code

Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks

no code implementations • 5 Oct 2023 • Xu Luo, Difan Zou, Lianli Gao, Zenglin Xu, Jingkuan Song

Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data, that is, training a linear classifier upon frozen features extracted from the pretrained model.

Feature Importance

Paper
Add Code

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

1 code implementation • NeurIPS 2023 • Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, Heng Tao Shen

In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.

Ranked #16 on Video Retrieval on MSVD

Image-text matching Image-to-Text Retrieval +6

Paper
Code

DePT: Decoupled Prompt Tuning

1 code implementation • 14 Sep 2023 • Ji Zhang, Shihan Wu, Lianli Gao, Heng Tao Shen, Jingkuan Song

Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i. e., the vast majority of feature channels are occupied by base-specific knowledge, resulting in the collapse of taskshared knowledge important to new tasks.

Zero-shot Generalization

Paper
Code

MSFlow: Multi-Scale Flow-based Framework for Unsupervised Anomaly Detection

1 code implementation • 29 Aug 2023 • Yixuan Zhou, Xing Xu, Jingkuan Song, Fumin Shen, Heng Tao Shen

Unsupervised anomaly detection (UAD) attracts a lot of research interest and drives widespread applications, where only anomaly-free samples are available for training.

Ranked #5 on Anomaly Detection on MVTec AD

Unsupervised Anomaly Detection

Paper
Code

CIParsing: Unifying Causality Properties into Multiple Human Parsing

no code implementations • 23 Aug 2023 • Xiaojia Chen, Xuanhan Wang, Lianli Gao, Beitao Chen, Jingkuan Song, HenTao Shen

Existing methods of multiple human parsing (MHP) apply statistical models to acquire underlying associations between images and labeled body parts.

Human Parsing

Paper
Add Code

From Global to Local: Multi-scale Out-of-distribution Detection

1 code implementation • 20 Aug 2023 • Ji Zhang, Lianli Gao, Bingguang Hao, Hao Huang, Jingkuan Song, HengTao Shen

Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process.

Out-of-Distribution Detection Out of Distribution (OOD) Detection +1

Paper
Code

Informative Scene Graph Generation via Debiasing

no code implementations • 10 Aug 2023 • Lianli Gao, Xinyu Lyu, Yuyu Guo, Yuxuan Hu, Yuan-Fang Li, Lu Xu, Heng Tao Shen, Jingkuan Song

It integrates two components: Semantic Debiasing (SD) and Balanced Predicate Learning (BPL), for these imbalances.

Blocking Graph Generation +4

Paper
Add Code

Part-Aware Transformer for Generalizable Person Re-identification

1 code implementation • ICCV 2023 • Hao Ni, Yuke Li, Lianli Gao, Heng Tao Shen, Jingkuan Song

Based on the local similarity obtained in CSL, a Part-guided Self-Distillation (PSD) is proposed to further improve the generalization of global features.

Domain Generalization Generalizable Person Re-identification

Paper
Code

MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text

no code implementations • 31 Jul 2023 • Junchen Zhu, Huan Yang, Wenjing Wang, Huiguo He, Zixi Tuo, Yongsheng Yu, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu, Jiebo Luo

In the basic generation, we take advantage of the pretrained image diffusion model, and adapt it to a high-quality open-domain vertical video generator for mobile devices.

Video Generation

Paper
Add Code

MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images

no code implementations • 12 Jun 2023 • Junchen Zhu, Huan Yang, Huiguo He, Wenjing Wang, Zixi Tuo, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu

To generate videos, we extend the capabilities of a pretrained text-to-image diffusion model through a two-stage process.

Retrieval

Paper
Add Code

CageViT: Convolutional Activation Guided Efficient Vision Transformer

no code implementations • 17 May 2023 • Hao Zheng, Jinbao Wang, XianTong Zhen, Hong Chen, Jingkuan Song, Feng Zheng

Recently, Transformers have emerged as the go-to architecture for both vision and language modeling tasks, but their computational efficiency is limited by the length of the input sequence.

Computational Efficiency Image Classification +1

Paper
Add Code

Prototype-based Embedding Network for Scene Graph Generation

1 code implementation • CVPR 2023 • Chaofan Zheng, Xinyu Lyu, Lianli Gao, Bo Dai, Jingkuan Song

Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs.

Graph Generation Relation +1

Paper
Code

DETA: Denoised Task Adaptation for Few-Shot Learning

2 code implementations • ICCV 2023 • Ji Zhang, Lianli Gao, Xu Luo, HengTao Shen, Jingkuan Song

Test-time task adaptation in few-shot learning aims to adapt a pre-trained task-agnostic model for capturing taskspecific knowledge of the test task, rely only on few-labeled support samples.

Denoising Few-Shot Learning

Paper
Code

Boosting Adversarial Attacks by Leveraging Decision Boundary Information

no code implementations • 10 Mar 2023 • Boheng Zeng, Lianli Gao, Qilong Zhang, CHAOQUN LI, Jingkuan Song, ShuaiQi Jing

However, our method still outperforms existing methods when attacking transformers.

Paper
Add Code

A Closer Look at Few-shot Classification Again

2 code implementations • 28 Jan 2023 • Xu Luo, Hao Wu, Ji Zhang, Lianli Gao, Jing Xu, Jingkuan Song

Few-shot classification consists of a training phase where a model is learned on a relatively large dataset and an adaptation phase where the learned model is adapted to previously-unseen tasks with limited labeled samples.

Classification Representation Learning +1

Paper
Code

Hyperbolic Hierarchical Contrastive Hashing

no code implementations • 17 Dec 2022 • Rukai Wei, Yu Liu, Jingkuan Song, Yanzhao Xie, Ke Zhou

To exploit the hierarchical semantic structures in hyperbolic space, we designed the hierarchical contrastive learning algorithm, including hierarchical instance-wise and hierarchical prototype-wise contrastive learning.

Contrastive Learning Retrieval

Paper
Add Code

A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval

2 code implementations • NeurIPS 2022 2022 • Hao Li, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Haonan Zhang, Gongfu Li

To verify the effectiveness of our approach, extensive experiments are conducted on MS-COCO, CUB Captions, and Flickr30K, which are commonly used in cross-modal retrieval.

Image-text matching Image-to-Text Retrieval +1

Paper
Code

Progressive Tree-Structured Prototype Network for End-to-End Image Captioning

1 code implementation • 17 Nov 2022 • Pengpeng Zeng, Jinkuan Zhu, Jingkuan Song, Lianli Gao

Specifically, we design a novel embedding method called tree-structured prototype, producing a set of hierarchical representative embeddings which capture the hierarchical semantic structure in textual space.

Image Captioning

Paper
Code

Learning Dual-Fused Modality-Aware Representations for RGBD Tracking

no code implementations • 6 Nov 2022 • Shang Gao, Jinyu Yang, Zhe Li, Feng Zheng, Aleš Leonardis, Jingkuan Song

However, some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored.

Object Tracking

Paper
Add Code

A Lower Bound of Hash Codes' Performance

1 code implementation • 12 Oct 2022 • Xiaosu Zhu, Jingkuan Song, Yu Lei, Lianli Gao, Heng Tao Shen

By testing on a series of hash-models, we obtain performance improvements among all of them, with an up to $26. 5\%$ increase in mean Average Precision and an up to $20. 5\%$ increase in accuracy.

Metric Learning Representation Learning

Paper
Code

Natural Color Fool: Towards Boosting Black-box Unrestricted Attacks

1 code implementation • 5 Oct 2022 • Shengming Yuan, Qilong Zhang, Lianli Gao, Yaya Cheng, Jingkuan Song

Unrestricted color attacks, which manipulate semantically meaningful color of an image, have shown their stealthiness and success in fooling both human eyes and deep neural networks.

Adversarial Attack

Paper
Code

RepParser: End-to-End Multiple Human Parsing with Representative Parts

no code implementations • 27 Aug 2022 • Xiaojia Chen, Xuanhan Wang, Lianli Gao, Jingkuan Song

Different from mainstream methods, RepParser solves the multiple human parsing in a new single-stage manner without resorting to person detection or post-grouping. To this end, RepParser decouples the parsing pipeline into instance-aware kernel generation and part-aware human parsing, which are responsible for instance separation and instance-specific part segmentation, respectively.

Human Detection Human Parsing

Paper
Add Code

Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning

no code implementations • 17 Aug 2022 • Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li

In this paper, we introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes but is required to infer relations for unseen target object classes.

Graph Generation Object +1

Paper
Add Code

Prompting for Multi-Modal Tracking

no code implementations • 29 Jul 2022 • Jinyu Yang, Zhe Li, Feng Zheng, Aleš Leonardis, Jingkuan Song

Multi-modal tracking gains attention due to its ability to be more accurate and robust in complex scenarios compared to traditional RGB-based tracking.

Ranked #20 on Rgb-T Tracking on LasHeR

Rgb-T Tracking

Paper
Add Code

Frequency Domain Model Augmentation for Adversarial Attack

2 code implementations • 12 Jul 2022 • Yuyang Long, Qilong Zhang, Boheng Zeng, Lianli Gao, Xianglong Liu, Jian Zhang, Jingkuan Song

Specifically, we apply a spectrum transformation to the input and thus perform the model augmentation in the frequency domain.

Adversarial Attack

142

Paper
Code

Adaptive Fine-Grained Predicates Learning for Scene Graph Generation

no code implementations • 11 Jul 2022 • Xinyu Lyu, Lianli Gao, Pengpeng Zeng, Heng Tao Shen, Jingkuan Song

The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e. g., woman-on/standing on/walking on-beach.

Fine-Grained Image Classification Graph Generation +4

Paper
Add Code

Skeleton-based Action Recognition via Adaptive Cross-Form Learning

1 code implementation • 30 Jun 2022 • Xuanhan Wang, Yan Dai, Lianli Gao, Jingkuan Song

Specifically, each GCN model in ACFL not only learns action representation from the single-form skeletons, but also adaptively mimics useful representations derived from other forms of skeletons.

Action Recognition Skeleton Based Action Recognition

Paper
Code

Learning To Generate Scene Graph from Head to Tail

no code implementations • 23 Jun 2022 • Chaofan Zheng, Xinyu Lyu, Yuyu Guo, Pengpeng Zeng, Jingkuan Song, Lianli Gao

SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations.

Graph Generation Scene Graph Generation

Paper
Add Code

KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing

1 code implementation • 21 Jun 2022 • Xuanhan Wang, Jingkuan Song, Xiaojia Chen, Lechao Cheng, Lianli Gao, Heng Tao Shen

In this article, we propose a Knowledge Embedded RCNN (KE-RCNN) to identify attributes by leveraging rich knowledges, including implicit knowledge (e. g., the attribute ``above-the-hip'' for a shirt requires visual/geometry relations of shirt-hip) and explicit knowledge (e. g., the part of ``shorts'' cannot have the attribute of ``hoodie'' or ``lining'').

Attribute

Paper
Code

KTN: Knowledge Transfer Network for Learning Multi-person 2D-3D Correspondences

1 code implementation • 21 Jun 2022 • Xuanhan Wang, Lianli Gao, Yixuan Zhou, Jingkuan Song, Meng Wang

Human densepose estimation, aiming at establishing dense correspondences between 2D pixels of human body and 3D human body template, is a key technique in enabling machines to have an understanding of people in images.

Human Part Segmentation Transfer Learning

Paper
Code

Rethinking Spatial Invariance of Convolutional Networks for Object Counting

1 code implementation • CVPR 2022 • Zhi-Qi Cheng, Qi Dai, Hong Li, Jingkuan Song, Xiao Wu, Alexander G. Hauptmann

We evaluate our methods on 4 mainstream object counting networks (i. e., MCNN, CSRNet, SANet, and ResNet-50).

Ranked #1 on Object Counting on TRANCOS

Crowd Counting Object +2

Paper
Code

From Pixels to Objects: Cubic Visual Attention for Visual Question Answering

no code implementations • 4 Jun 2022 • Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen

Existing visual attention models are generally planar, i. e., different channels of the last conv-layer feature map of an image share the same weight.

Object Question Answering +1

Paper
Add Code

Structured Two-stream Attention Network for Video Question Answering

no code implementations • 2 Jun 2022 • Lianli Gao, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen

To date, visual question answering (VQA) (i. e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA.

Question Answering Video Question Answering +2

Paper
Add Code

Support-set based Multi-modal Representation Enhancement for Video Captioning

1 code implementation • 19 May 2022 • Xiaoya Chen, Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen

Video captioning is a challenging task that necessitates a thorough comprehension of visual scenes.

Video Captioning

Paper
Code

Fine-Grained Predicates Learning for Scene Graph Generation

1 code implementation • CVPR 2022 • Xinyu Lyu, Lianli Gao, Yuyu Guo, Zhou Zhao, Hao Huang, Heng Tao Shen, Jingkuan Song

The performance of current Scene Graph Generation models is severely hampered by some hard-to-distinguish predicates, e. g., "woman-on/standing on/walking on-beach" or "woman-near/looking at/in front of-child".

Fine-Grained Image Classification Graph Generation +2

Paper
Code

Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression

1 code implementation • CVPR 2022 • Xiaosu Zhu, Jingkuan Song, Lianli Gao, Feng Zheng, Heng Tao Shen

Modeling latent variables with priors and hyperpriors is an essential problem in variational image compression.

Image Compression Quantization

106

Paper
Code

Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack

1 code implementation • CVPR 2022 • Ye Liu, Yaya Cheng, Lianli Gao, Xianglong Liu, Qilong Zhang, Jingkuan Song

Specifically, by observing that adversarial examples to a specific defense model follow some regularities in their starting points, we design an Adaptive Direction Initialization strategy to speed up the evaluation.

Adversarial Robustness

Paper
Code

Practical No-box Adversarial Attacks with Training-free Hybrid Image Transformation

no code implementations • 9 Mar 2022 • Qilong Zhang, Chaoning Zhang, CHAOQUN LI, Jingkuan Song, Lianli Gao

In this paper, we move a step forward and show the existence of a \textbf{training-free} adversarial perturbation under the no-box threat model, which can be successfully used to attack different DNNs in real-time.

Paper
Add Code

One-shot Scene Graph Generation

1 code implementation • 22 Feb 2022 • Yuyu Guo, Jingkuan Song, Lianli Gao, Heng Tao Shen

Specifically, the Relational Knowledge represents the prior knowledge of relationships between entities extracted from the visual content, e. g., the visual relationships "standing in", "sitting in", and "lying in" may exist between "dog" and "yard", while the Commonsense Knowledge encodes "sense-making" knowledge like "dog can guard yard".

Graph Generation Scene Graph Generation

Paper
Code

Relation Regularized Scene Graph Generation

no code implementations • 22 Feb 2022 • Yuyu Guo, Lianli Gao, Jingkuan Song, Peng Wang, Nicu Sebe, Heng Tao Shen, Xuelong Li

Inspired by this observation, in this article, we propose a relation regularized network (R2-Net), which can predict whether there is a relationship between two objects and encode this relation into object feature refinement and better SGG.

Graph Classification Graph Generation +6

Paper
Add Code

Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains

2 code implementations • ICLR 2022 • Qilong Zhang, Xiaodan Li, Yuefeng Chen, Jingkuan Song, Lianli Gao, Yuan He, Hui Xue

Notably, our methods outperform state-of-the-art approaches by up to 7. 71\% (towards coarse-grained domains) and 25. 91\% (towards fine-grained domains) on average.

Paper
Code

Meta Distribution Alignment for Generalizable Person Re-Identification

1 code implementation • CVPR 2022 • Hao Ni, Jingkuan Song, Xiaopeng Luo, Feng Zheng, Wen Li, Heng Tao Shen

Domain Generalizable (DG) person ReID is a challenging task which trains a model on source domains yet generalizes well on target domains.

Domain Generalization Generalizable Person Re-identification +1

Paper
Code

Technical Report: Disentangled Action Parsing Networks for Accurate Part-level Action Parsing

no code implementations • 5 Nov 2021 • Xuanhan Wang, Xiaojia Chen, Lianli Gao, Lechao Chen, Jingkuan Song

Despite of dramatic progresses in the area of video classification research, a severe problem faced by the community is that the detailed understanding of human actions is ignored.

Action Parsing Action Recognition In Videos +2

Paper
Add Code

Fast Gradient Non-sign Methods

1 code implementation • 25 Oct 2021 • Yaya Cheng, Jingkuan Song, Xiaosu Zhu, Qilong Zhang, Lianli Gao, Heng Tao Shen

Based on the linearity hypothesis, under $\ell_\infty$ constraint, $sign$ operation applied to the gradients is a good choice for generating perturbations.

Paper
Code

From General to Specific: Informative Scene Graph Generation via Balance Adjustment

1 code implementation • ICCV 2021 • Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng Tao Shen, Jingkuan Song

The scene graph generation (SGG) task aims to detect visual relationship triplets, i. e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding.

Blocking Graph Generation +2

Paper
Code

Semi-supervised Network Embedding with Differentiable Deep Quantisation

no code implementations • 20 Aug 2021 • Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li

Learning accurate low-dimensional embeddings for a network is a crucial task as it facilitates many downstream network analytics tasks.

Link Prediction Network Embedding +2

Paper
Add Code

Unsupervised Domain-adaptive Hash for Networks

no code implementations • 20 Aug 2021 • Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li

Abundant real-world data can be naturally represented by large-scale networks, which demands efficient and effective learning algorithms.

Link Prediction Node Classification +1

Paper
Add Code

Semantic Compositional Learning for Low-shot Scene Graph Generation

no code implementations • 19 Aug 2021 • Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li

Scene graphs provide valuable information to many downstream tasks.

Graph Generation Relation +1

Paper
Add Code

Exploiting Scene Graphs for Human-Object Interaction Detection

1 code implementation • ICCV 2021 • Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li

Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.

Human-Object Interaction Detection Object

Paper
Code

Feature Space Targeted Attacks by Statistic Alignment

1 code implementation • 25 May 2021 • Lianli Gao, Yaya Cheng, Qilong Zhang, Xing Xu, Jingkuan Song

However, the current choice of pixel-wise Euclidean Distance to measure the discrepancy is questionable because it unreasonably imposes a spatial-consistency constraint on the source and target features.

Translation

Paper
Code

Revisiting Multi-Codebook Quantization

no code implementations • NeurIPS 2021 • Xiaosu Zhu, Jingkuan Song, Lianli Gao, Xiaoyan Gu, HengTao Shen

However, finding the optimal solution to MCQ is proved to be NP-hard due to its encoding process, \textit{i. e.}, converting an input vector to a binary code.

Quantization Retrieval

Paper
Add Code

Staircase Sign Method for Boosting Adversarial Attacks

2 code implementations • 20 Apr 2021 • Qilong Zhang, Xiaosu Zhu, Jingkuan Song, Lianli Gao, Heng Tao Shen

Crafting adversarial examples for the transfer-based attack is challenging and remains a research hot spot.

Adversarial Attack

142

Paper
Code

Patch-wise++ Perturbation for Adversarial Targeted Attacks

1 code implementation • 31 Dec 2020 • Lianli Gao, Qilong Zhang, Jingkuan Song, Heng Tao Shen

Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $\epsilon$-constraint is properly assigned to its surrounding regions by a project kernel.

Adversarial Attack

Paper
Code

Patch-wise Attack for Fooling Deep Neural Network

4 code implementations • ECCV 2020 • Lianli Gao, Qilong Zhang, Jingkuan Song, Xianglong Liu, Heng Tao Shen

By adding human-imperceptible noise to clean images, the resultant adversarial examples can fool other unknown models.

Adversarial Attack Image Classification

142

Paper
Code

Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation

no code implementations • 13 Jun 2020 • Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li

Despite the huge progress in scene graph generation in recent years, its long-tail distribution in object relationships remains a challenging and pestering issue.

Graph Generation Object +2

Paper
Add Code

Binary Neural Networks: A Survey

2 code implementations • 31 Mar 2020 • Haotong Qin, Ruihao Gong, Xianglong Liu, Xiao Bai, Jingkuan Song, Nicu Sebe

The binary neural network, largely saving the storage and computation, serves as a promising technique for deploying deep models on resource-limited devices.

Binarization Image Classification +4

1,639

Paper
Code

Forward and Backward Information Retention for Accurate Binary Neural Networks

2 code implementations • CVPR 2020 • Haotong Qin, Ruihao Gong, Xianglong Liu, Mingzhu Shen, Ziran Wei, Fengwei Yu, Jingkuan Song

Our empirical study indicates that the quantization brings information loss in both forward and backward propagation, which is the bottleneck of training accurate binary neural networks.

Binarization Neural Network Compression +1

759

Paper
Code

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

2 code implementations • 12 Aug 2019 • Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen, Jingkuan Song

We propose a novel framework that achieves remarkable matching performance with acceptable model complexity.

Binary Classification General Classification +4

Paper
Code

One Network for Multi-Domains: Domain Adaptive Hashing with Intersectant Generative Adversarial Network

1 code implementation • 1 Jul 2019 • Tao He, Yuan-Fang Li, Lianli Gao, Dongxiang Zhang, Jingkuan Song

We evaluate our framework on {four} public benchmark datasets, all of which show that our method is superior to the other state-of-the-art methods on the tasks of object recognition and image retrieval.

Generative Adversarial Network Image Retrieval +2

Paper
Code

Localizing Unseen Activities in Video via Image Query

no code implementations • 28 Jun 2019 • Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Deng Cai

Thus, we consider a new task to localize unseen activities in videos via image queries, named Image-Based Activity Localization.

Action Localization Video Understanding

Paper
Add Code

Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks

no code implementations • 28 Jun 2019 • Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Xiaofei He

Concretely, we first develop a hierarchical convolutional self-attention encoder to efficiently model long-form video contents, which builds the hierarchical structure for video sequences and captures question-aware long-range dependencies from video context.

Answer Generation Decoder +2

Paper
Add Code

Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval

1 code implementation • 16 Jun 2019 • Lianli Gao, Xiaosu Zhu, Jingkuan Song, Zhou Zhao, Heng Tao Shen

In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ, for large scale image retrieval.

Image Retrieval Quantization +1

Paper
Code

Deep Recurrent Quantization for Generating Sequential Binary Codes

1 code implementation • 16 Jun 2019 • Jingkuan Song, Xiaosu Zhu, Lianli Gao, Xin-Shun Xu, Wu Liu, Heng Tao Shen

To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations.

Image Retrieval Quantization +1

Paper
Code

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

no code implementations • 26 Dec 2018 • Jingkuan Song, Xiangpeng Li, Lianli Gao, Heng Tao Shen

Also, a hierarchical LSTMs is designed to simultaneously consider both low-level visual information and high-level language context information to support the caption generation.

Caption Generation Image Captioning +2

Paper
Add Code

NAIS: Neural Attentive Item Similarity Model for Recommendation

3 code implementations • 19 Sep 2018 • Xiangnan He, Zhankui He, Jingkuan Song, Zhenguang Liu, Yu-Gang Jiang, Tat-Seng Chua

As such, the key to an item-based CF method is in the estimation of item similarities.

Collaborative Filtering Recommendation Systems

147

Paper
Code

Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval

no code implementations • 5 Mar 2018 • Dan Xu, Xavier Alameda-Pineda, Jingkuan Song, Elisa Ricci, Nicu Sebe

In this paper we address the problem of learning robust cross-domain representations for sketch-based image retrieval (SBIR).

Representation Learning Retrieval +1

Paper
Add Code

Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder

no code implementations • 7 Feb 2018 • Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, Richang Hong

Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss.

Binarization Decoder +2

Paper
Add Code

Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval

no code implementations • ICCV 2017 • Yuming Shen, Li Liu, Ling Shao, Jingkuan Song

Cross-modal hashing is usually regarded as an effective technique for large-scale textual-visual cross retrieval, where data from different modalities are mapped into a shared Hamming space for matching.

Cross-Modal Retrieval Descriptive +1

Paper
Add Code

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

no code implementations • 8 Aug 2017 • Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong. Li, Alan Hanjalic, Heng Tao Shen

In this paper, we propose a generative approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which models the uncertainty observed in the data using latent stochastic variables.

Decoder Video Captioning

Paper
Add Code

Binary Generative Adversarial Networks for Image Retrieval

1 code implementation • 8 Aug 2017 • Jingkuan Song

By restricting the input noise variable of generative adversarial networks (GAN) to be binary and conditioned on the features of each input image, BGAN can simultaneously learn a binary representation per image, and generate an image plausibly similar to the original one.

Deep Hashing Image Retrieval

Paper
Code

Discrete Multi-modal Hashing with Canonical Views for Robust Mobile Landmark Search

no code implementations • 13 Jul 2017 • Lei Zhu, Zi Huang, Xiaobai Liu, Xiangnan He, Jingkuan Song, Xiaofang Zhou

Finally, compact binary codes are learned on intermediate representation within a tailored discrete binary embedding model which preserves visual relations of images measured with canonical views and removes the involved noises.

Paper
Add Code

Learning in High-Dimensional Multimedia Data: The State of the Art

no code implementations • 10 Jul 2017 • Lianli Gao, Jingkuan Song, Xingyi Liu, Junming Shao, Jiajun Liu, Jie Shao

Given the high dimensionality and the high complexity of multimedia data, it is important to investigate new machine learning algorithms to facilitate multimedia data analysis.

BIG-bench Machine Learning feature selection +3

Paper
Add Code

Deep Discrete Hashing with Self-supervised Pairwise Labels

1 code implementation • 7 Jul 2017 • Jingkuan Song, Tao He, Hangbo Fan, Lianli Gao

2) how to equip the binary representation with the ability of accurate image retrieval and classification in an unsupervised way?

Deep Hashing General Classification +2

Paper
Code

Matrix Tri-Factorization With Manifold Regularizations for Zero-Shot Learning

no code implementations • CVPR 2017 • Xing Xu, Fumin Shen, Yang Yang, Dongxiang Zhang, Heng Tao Shen, Jingkuan Song

By additionally introducing manifold regularizations on visual data and semantic embeddings, the learned projection can effectively captures the geometrical manifold structure residing in both visual and semantic spaces.

Retrieval Transfer Learning +1

Paper
Add Code

Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning

no code implementations • 5 Jun 2017 • Jingkuan Song, Zhao Guo, Lianli Gao, Wu Liu, Dongxiang Zhang, Heng Tao Shen

Specifically, the proposed framework utilizes the temporal attention for selecting specific frames to predict the related words, while the adjusted temporal attention is for deciding whether to depend on the visual information or the language context information.

Caption Generation Decoder +2

Paper
Add Code

Deep Region Hashing for Efficient Large-scale Instance Search from Images

no code implementations • 26 Jan 2017 • Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Heng Tao Shen

Specifically, DRH is an end-to-end deep neural network which consists of object proposal, feature extraction, and hash code generation.

Code Generation Image Retrieval +3

Paper
Add Code

A Survey on Learning to Hash

no code implementations • 1 Jun 2016 • Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, Heng Tao Shen

In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations.

Quantization

Paper
Add Code

Localize Me Anywhere, Anytime: A Multi-Task Point-Retrieval Approach

no code implementations • ICCV 2015 • Guoyu Lu, Yan Yan, Li Ren, Jingkuan Song, Nicu Sebe, Chandra Kambhamettu

The main contribution of our paper is that we use a 3D model reconstructed by a short video as the query to realize 3D-to-3D localization under a multi-task point retrieval framework.

Image-Based Localization Multi-Task Learning +1

Paper
Add Code

Learning Deep Representations of Appearance and Motion for Anomalous Event Detection

no code implementations • 6 Oct 2015 • Dan Xu, Elisa Ricci, Yan Yan, Jingkuan Song, Nicu Sebe

We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes.

Anomaly Detection Denoising +1

Paper
Add Code

Optimal Graph Learning With Partial Tags and Multiple Features for Image and Video Annotation

no code implementations • CVPR 2015 • Lianli Gao, Jingkuan Song, Feiping Nie, Yan Yan, Nicu Sebe, Heng Tao Shen

In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available.

graph construction Graph Learning

Paper
Add Code

Hashing for Similarity Search: A Survey

no code implementations • 13 Aug 2014 • Jingdong Wang, Heng Tao Shen, Jingkuan Song, Jianqiu Ji

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database.

Paper
Add Code

Optimized Cartesian $K$-Means

no code implementations • 16 May 2014 • Jianfeng Wang, Jingdong Wang, Jingkuan Song, Xin-Shun Xu, Heng Tao Shen, Shipeng Li

In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace.

Quantization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.