no code implementations • 19 Apr 2024 • Yian Li, Wentao Tian, Yang Jiao, Jingjing Chen, Yu-Gang Jiang
Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making presuppositions based on established facts and extrapolating potential outcomes.
1 code implementation • 12 Mar 2024 • Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang
This adaptation leads to convenient development of such LMMs with minimal modifications, however, it overlooks the intrinsic characteristics of diverse visual tasks and hinders the learning of perception capabilities.
no code implementations • 12 Mar 2024 • Guoshan Liu, Yang Jiao, Jingjing Chen, Bin Zhu, Yu-Gang Jiang
These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain.
1 code implementation • 5 Mar 2024 • Xue Song, Jiequan Cui, Hanwang Zhang, Jingjing Chen, Richang Hong, Yu-Gang Jiang
Through the lens of the formulation, we find that the crux of TBIE is that existing techniques hardly achieve a good trade-off between editability and fidelity, mainly due to the overfitting of the single-image fine-tuning.
1 code implementation • 25 Dec 2023 • Wentao Tian, Zheng Wang, Yuqian Fu, Jingjing Chen, Lechao Cheng
A comprehensive understanding of videos is inseparable from describing the action with its contextual action-object interactions.
no code implementations • 22 Dec 2023 • Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, Chong-Wah Ngo
In the second stage, we construct a multi-round conversation dataset and a reasoning segmentation dataset to fine-tune the model, enabling it to conduct professional dialogues and generate segmentation masks based on complex reasoning in the food domain.
no code implementations • 13 Dec 2023 • Yang Jiao, Zequn Jie, Shaoxiang Chen, Lechao Cheng, Jingjing Chen, Lin Ma, Yu-Gang Jiang
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
1 code implementation • 4 Oct 2023 • Zejun Li, Ye Wang, Mengfei Du, Qingwen Liu, Binhao Wu, Jiwen Zhang, Chengxing Zhou, Zhihao Fan, Jie Fu, Jingjing Chen, Xuanjing Huang, Zhongyu Wei
Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs).
no code implementations • 14 Aug 2023 • Yilun Zhang, Yuqian Fu, Xingjun Ma, Lizhe Qi, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
We are thus motivated to investigate the importance of spatial relations and propose a more accurate few-shot action recognition method that leverages both spatial and temporal information.
2 code implementations • 24 May 2023 • Tianwen Qian, Jingjing Chen, Linhai Zhuo, Yang Jiao, Yu-Gang Jiang
We introduce a novel visual question answering (VQA) task in the context of autonomous driving, aiming to answer natural language questions based on street-view clues.
no code implementations • 12 Dec 2022 • Junke Wang, Zhenxin Li, Chao Zhang, Jingjing Chen, Zuxuan Wu, Larry S. Davis, Yu-Gang Jiang
Online media data, in the forms of images and videos, are becoming mainstream communication channels.
no code implementations • 29 Nov 2022 • Huiyan Qi, Lechao Cheng, Jingjing Chen, Yue Yu, Xue Song, Zunlei Feng, Yu-Gang Jiang
Transfer learning aims to improve the performance of target tasks by transferring knowledge acquired in source tasks.
1 code implementation • CVPR 2023 • Zhen Xing, Qi Dai, Han Hu, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
In this paper, we investigate the use of transformer models under the SSL setting for action recognition.
1 code implementation • 11 Oct 2022 • Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, Yu-Gang Jiang
Concretely, to solve the data imbalance problem between the source data with sufficient examples and the auxiliary target data with limited examples, we build our model under the umbrella of multi-expert learning.
1 code implementation • 11 Oct 2022 • Linhai Zhuo, Yuqian Fu, Jingjing Chen, Yixin Cao, Yu-Gang Jiang
The proposed TGDM framework contains a Mixup-3T network for learning classifiers and a dynamic ratio generation network (DRGN) for learning the optimal mix ratio.
no code implementations • 6 Oct 2022 • Xue Song, Jingjing Chen, Bin Zhu, Yu-Gang Jiang
Specifically, appearance and motion components are provided by the image and caption separately.
no code implementations • 5 Oct 2022 • Tianwen Qian, Ran Cui, Jingjing Chen, Pai Peng, Xiaowei Guo, Yu-Gang Jiang
Considering the fact that the question often remains concentrated in a short temporal range, we propose to first locate the question to a segment in the video and then infer the answer using the located segment only.
1 code implementation • CVPR 2023 • Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
Our new attack method is proposed based on the observation that highly universal adversarial perturbations tend to be more transferable for targeted attacks.
1 code implementation • CVPR 2023 • Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, Yu-Gang Jiang
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques.
1 code implementation • CVPR 2022 • Jianggang Zhu, Zheng Wang, Jingjing Chen, Yi-Ping Phoebe Chen, Yu-Gang Jiang
In this paper, we focus on representation learning for imbalanced data.
1 code implementation • 1 Jul 2022 • Jichao Zhang, Jingjing Chen, Hao Tang, Enver Sangineto, Peng Wu, Yan Yan, Nicu Sebe, Wei Wang
Solving this problem using an unsupervised method remains an open problem, especially for high-resolution face images in the wild, which are not easy to annotate with gaze and head pose labels.
no code implementations • 11 Jun 2022 • Zhihao Fan, Zhongyu Wei, Jingjing Chen, Siyuan Wang, Zejun Li, Jiarong Xu, Xuanjing Huang
These two steps are iteratively performed in our framework for continuous learning.
no code implementations • 8 May 2022 • Bin Zhu, Chong-Wah Ngo, Jingjing Chen, Wing-Kwong Chan
To bridge the domain gap, recipe mixup loss is proposed to enforce the intermediate domain to locate in the shortest geodesic path between source and target domains in the recipe embedding space.
1 code implementation • 26 Apr 2022 • Zixuan Su, Hao Zhang, Jingjing Chen, Lei Pang, Chong-Wah Ngo, Yu-Gang Jiang
Neural networks for visual content understanding have recently evolved from convolutional ones (CNNs) to transformers.
1 code implementation • 20 Apr 2022 • Ran Cui, Tianwen Qian, Pai Peng, Elena Daskalaki, Jingjing Chen, Xiaowei Guo, Huyang Sun, Yu-Gang Jiang
Weakly supervised methods only rely on the paired video and query, but the performance is relatively poor.
no code implementations • CVPR 2022 • Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, Yu-Gang Jiang
Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives the research of image tampering detection.
1 code implementation • 15 Mar 2022 • Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, Yu-Gang Jiang
The key challenge of CD-FSL lies in the huge data shift between source and target domains, which is typically in the form of totally different visual styles.
Ranked #2 on Cross-Domain Few-Shot on CUB
no code implementations • 10 Mar 2022 • Yang Jiao, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang
Recently, one-stage visual grounders attract high attention due to their comparable accuracy but significantly higher efficiency than two-stage grounders.
1 code implementation • 10 Mar 2022 • Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang
3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart.
1 code implementation • 29 Jan 2022 • Zejun Li, Zhihao Fan, Huaixiao Tou, Jingjing Chen, Zhongyu Wei, Xuanjing Huang
In MVPTR, we follow the nested structure of both modalities to introduce concepts as high-level semantics.
no code implementations • CVPR 2022 • Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
This paper investigates the transferability of adversarial perturbation across different modalities, i. e., leveraging adversarial perturbation generated on white-box image models to attack black-box video models.
no code implementations • 10 Dec 2021 • Tianyi Liu, Zuxuan Wu, Wenhan Xiong, Jingjing Chen, Yu-Gang Jiang
Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model, and a feasible way to improve both tasks is to use more data.
2 code implementations • 29 Oct 2021 • Ning Han, Jingjing Chen, Chuhao Shi, Yawen Zeng, Guangyi Xiao, Hao Chen
The task of text-video retrieval aims to understand the correspondence between language and vision, has gained increasing attention in recent years.
1 code implementation • 29 Oct 2021 • Kai Chen, Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
On both UCF-101 and HMDB-51 datasets, our BSC attack method can achieve about 90\% fooling rate when attacking three mainstream video recognition models, while only occluding \textless 8\% areas in the video.
Adversarial Attack Adversarial Attack on Video Classification +2
1 code implementation • 18 Oct 2021 • Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
To this end, we propose to boost the transferability of video adversarial examples for black-box attacks on video recognition models.
1 code implementation • 9 Oct 2021 • Yang Jiao, Zequn Jie, Weixin Luo, Jingjing Chen, Yu-Gang Jiang, Xiaolin Wei, Lin Ma
Referring Image Segmentation (RIS) aims at segmenting the target object from an image referred by one given natural language expression.
no code implementations • 23 Sep 2021 • Fan Luo, Shaoxiang Chen, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
Given a text description, Temporal Language Grounding (TLG) aims to localize temporal boundaries of the segments that contain the specified semantics in an untrimmed video.
2 code implementations • 9 Sep 2021 • Zhipeng Wei, Jingjing Chen, Micah Goldblum, Zuxuan Wu, Tom Goldstein, Yu-Gang Jiang
We evaluate the transferability of attacks on state-of-the-art ViTs, CNNs and robustly trained CNNs.
1 code implementation • 10 Jun 2021 • Rui Wang, Zuxuan Wu, Zejia Weng, Jingjing Chen, Guo-Jun Qi, Yu-Gang Jiang
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a fully-labeled source domain to a different unlabeled target domain.
1 code implementation • ICCV 2021 • Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang Jiang, Larry Davis
In this paper, we introduce VideoLT, a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition.
1 code implementation • 20 Apr 2021 • Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Ser-Nam Lim, Yu-Gang Jiang
The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images.
no code implementations • 20 Apr 2021 • Zejia Weng, Zuxuan Wu, Hengduo Li, Jingjing Chen, Yu-Gang Jiang
Conventional video recognition pipelines typically fuse multimodal features for improved performance.
1 code implementation • 5 Jan 2021 • Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, Yu-Gang Jiang
WildDeepfake is a small dataset that can be used, in addition to existing datasets, to develop and test the effectiveness of deepfake detectors against real-world deepfakes.
no code implementations • 31 Dec 2020 • Zhi-Qin Zhan, Huazhu Fu, Yan-Yao Yang, Jingjing Chen, Jie Liu, Yu-Gang Jiang
However, there are several issues between the image-based training and video-based inference, including domain differences, lack of positive samples, and temporal smoothness.
no code implementations • 20 Aug 2020 • Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yu-Gang Jiang, Tat-Seng Chua
Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe.
1 code implementation • 9 Aug 2020 • Jichao Zhang, Jingjing Chen, Hao Tang, Wei Wang, Yan Yan, Enver Sangineto, Nicu Sebe
In this paper we address the problem of unsupervised gaze correction in the wild, presenting a solution that works without the need for precise annotations of the gaze angle and the head pose.
5 code implementations • Interspeech 2020 • Jingjing Chen, Qirong Mao, Dong Liu
By introduces a improved transformer, elements in speech sequences can interact directly, which enables DPTNet can model for the speech sequences with direct context-awareness.
Ranked #15 on Speech Separation on WSJ0-2mix
Speech Separation Audio and Speech Processing Sound
1 code implementation • 7 Apr 2020 • Jingjing Chen, Jichao Zhang, Enver Sangineto, Jiayuan Fan, Tao Chen, Nicu Sebe
In this paper, we propose to alleviate these problems by means of a novel gaze redirection framework which exploits both a numerical and a pictorial direction guidance, jointly with a coarse-to-fine learning strategy.
1 code implementation • CVPR 2020 • Shihao Zhao, Xingjun Ma, Xiang Zheng, James Bailey, Jingjing Chen, Yu-Gang Jiang
We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models, a situation where backdoor attacks are likely to be challenged by the above 4 strict conditions.
1 code implementation • 21 Nov 2019 • Zhipeng Wei, Jingjing Chen, Xingxing Wei, Linxi Jiang, Tat-Seng Chua, Fengfeng Zhou, Yu-Gang Jiang
To overcome this challenge, we propose a heuristic black-box attack model that generates adversarial perturbations only on the selected frames and regions.
no code implementations • arXiv 2019 • Jichao Zhang, Meng Sun, Jingjing Chen, Hao Tang, Yan Yan, Xueying Qin, Nicu Sebe
Gaze correction aims to redirect the person's gaze into the camera by manipulating the eye region, and it can be considered as a specific image resynthesis problem.
5 code implementations • 29 Mar 2016 • Jingjing Chen, David M. Kipping
By conditioning our model upon a sample spanning dwarf planets to late-type stars, Forecaster can predict the mass (or radius) from the radius (or mass) for objects covering nine orders-of-magnitude in mass.
Earth and Planetary Astrophysics Instrumentation and Methods for Astrophysics