Search Results for author: Jingjing Chen

Found 52 papers, 32 papers with code

Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models

no code implementations • 19 Apr 2024 • Yian Li, Wentao Tian, Yang Jiao, Jingjing Chen, Yu-Gang Jiang

Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making presuppositions based on established facts and extrapolating potential outcomes.

Benchmarking counterfactual +3

Paper
Add Code

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models

1 code implementation • 12 Mar 2024 • Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

This adaptation leads to convenient development of such LMMs with minimal modifications, however, it overlooks the intrinsic characteristics of diverse visual tasks and hinders the learning of perception capabilities.

Concept Alignment Language Modelling

Paper
Code

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios

no code implementations • 12 Mar 2024 • Guoshan Liu, Yang Jiao, Jingjing Chen, Bin Zhu, Yu-Gang Jiang

These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain.

Food Recognition

Paper
Add Code

Doubly Abductive Counterfactual Inference for Text-based Image Editing

1 code implementation • 5 Mar 2024 • Xue Song, Jiequan Cui, Hanwang Zhang, Jingjing Chen, Richang Hong, Yu-Gang Jiang

Through the lens of the formulation, we find that the crux of TBIE is that existing techniques hardly achieve a good trade-off between editability and fidelity, mainly due to the overfitting of the single-image fine-tuning.

counterfactual Counterfactual Inference +2

Paper
Code

Open-Vocabulary Video Relation Extraction

1 code implementation • 25 Dec 2023 • Wentao Tian, Zheng Wang, Yuqian Fu, Jingjing Chen, Lechao Cheng

A comprehensive understanding of videos is inseparable from describing the action with its contextual action-object interactions.

Action Classification Action Understanding +3

Paper
Code

FoodLMM: A Versatile Food Assistant using Large Multi-modal Model

no code implementations • 22 Dec 2023 • Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, Chong-Wah Ngo

In the second stage, we construct a multi-round conversation dataset and a reasoning segmentation dataset to fine-tune the model, enabling it to conduct professional dialogues and generate segmentation masks based on complex reasoning in the food domain.

Food Recognition Multi-Task Learning +3

Paper
Add Code

Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning

no code implementations • 13 Dec 2023 • Yang Jiao, Zequn Jie, Shaoxiang Chen, Lechao Cheng, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.

3D Object Detection Autonomous Driving +3

Paper
Add Code

ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks

1 code implementation • 4 Oct 2023 • Zejun Li, Ye Wang, Mengfei Du, Qingwen Liu, Binhao Wu, Jiwen Zhang, Chengxing Zhou, Zhihao Fan, Jie Fu, Jingjing Chen, Xuanjing Huang, Zhongyu Wei

Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs).

Paper
Code

On the Importance of Spatial Relations for Few-shot Action Recognition

no code implementations • 14 Aug 2023 • Yilun Zhang, Yuqian Fu, Xingjun Ma, Lizhe Qi, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

We are thus motivated to investigate the importance of spatial relations and propose a more accurate few-shot action recognition method that leverages both spatial and temporal information.

Few-Shot action recognition Few Shot Action Recognition +1

Paper
Add Code

NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario

2 code implementations • 24 May 2023 • Tianwen Qian, Jingjing Chen, Linhai Zhuo, Yang Jiao, Yu-Gang Jiang

We introduce a novel visual question answering (VQA) task in the context of autonomous driving, aiming to answer natural language questions based on street-view clues.

Autonomous Driving Question Answering +1

638

Paper
Code

Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection

no code implementations • 12 Dec 2022 • Junke Wang, Zhenxin Li, Chao Zhang, Jingjing Chen, Zuxuan Wu, Larry S. Davis, Yu-Gang Jiang

Online media data, in the forms of images and videos, are becoming mainstream communication channels.

DeepFake Detection Face Swapping

Paper
Add Code

Transferability Estimation Based On Principal Gradient Expectation

no code implementations • 29 Nov 2022 • Huiyan Qi, Lechao Cheng, Jingjing Chen, Yue Yu, Xue Song, Zunlei Feng, Yu-Gang Jiang

Transfer learning aims to improve the performance of target tasks by transferring knowledge acquired in source tasks.

Transfer Learning

Paper
Add Code

SVFormer: Semi-supervised Video Transformer for Action Recognition

1 code implementation • CVPR 2023 • Zhen Xing, Qi Dai, Han Hu, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

In this paper, we investigate the use of transformer models under the SSL setting for action recognition.

Action Recognition Semi-Supervised Image Classification +1

Paper
Code

ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain Few-Shot Learning

1 code implementation • 11 Oct 2022 • Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, Yu-Gang Jiang

Concretely, to solve the data imbalance problem between the source data with sufficient examples and the auxiliary target data with limited examples, we build our model under the umbrella of multi-expert learning.

cross-domain few-shot learning Knowledge Distillation

Paper
Code

TGDM: Target Guided Dynamic Mixup for Cross-Domain Few-Shot Learning

1 code implementation • 11 Oct 2022 • Linhai Zhuo, Yuqian Fu, Jingjing Chen, Yixin Cao, Yu-Gang Jiang

The proposed TGDM framework contains a Mixup-3T network for learning classifiers and a dynamic ratio generation network (DRGN) for learning the optimal mix ratio.

cross-domain few-shot learning Transfer Learning

Paper
Code

Text-driven Video Prediction

no code implementations • 6 Oct 2022 • Xue Song, Jingjing Chen, Bin Zhu, Yu-Gang Jiang

Specifically, appearance and motion components are provided by the image and caption separately.

Causal Inference Video Generation +1

Paper
Add Code

Locate before Answering: Answer Guided Question Localization for Video Question Answering

no code implementations • 5 Oct 2022 • Tianwen Qian, Ran Cui, Jingjing Chen, Pai Peng, Xiaowei Guo, Yu-Gang Jiang

Considering the fact that the question often remains concentrated in a short temporal range, we propose to first locate the question to a segment in the video and then infer the answer using the located segment only.

Question Answering Video Question Answering

Paper
Add Code

Enhancing the Self-Universality for Transferable Targeted Attacks

1 code implementation • CVPR 2023 • Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

Our new attack method is proposed based on the observation that highly universal adversarial perturbations tend to be more transferable for targeted attacks.

Paper
Code

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

1 code implementation • CVPR 2023 • Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques.

3D Object Detection Autonomous Driving +1

156

Paper
Code

Balanced Contrastive Learning for Long-Tailed Visual Recognition

1 code implementation • CVPR 2022 • Jianggang Zhu, Zheng Wang, Jingjing Chen, Yi-Ping Phoebe Chen, Yu-Gang Jiang

In this paper, we focus on representation learning for imbalanced data.

Ranked #1 on Long-tail Learning on CIFAR-10-LT (ρ=100) on CIFAR-10-LT (ρ=100)

Contrastive Learning Image Classification +3

Paper
Code

Unsupervised High-Resolution Portrait Gaze Correction and Animation

1 code implementation • 1 Jul 2022 • Jichao Zhang, Jingjing Chen, Hao Tang, Enver Sangineto, Peng Wu, Yan Yan, Nicu Sebe, Wei Wang

Solving this problem using an unsupervised method remains an open problem, especially for high-resolution face images in the wild, which are not easy to annotate with gaze and head pose labels.

Image Inpainting Vocal Bursts Intensity Prediction

191

Paper
Code

A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training

no code implementations • 11 Jun 2022 • Zhihao Fan, Zhongyu Wei, Jingjing Chen, Siyuan Wang, Zejun Li, Jiarong Xu, Xuanjing Huang

These two steps are iteratively performed in our framework for continuous learning.

Paper
Add Code

Cross-lingual Adaptation for Recipe Retrieval with Mixup

no code implementations • 8 May 2022 • Bin Zhu, Chong-Wah Ngo, Jingjing Chen, Wing-Kwong Chan

To bridge the domain gap, recipe mixup loss is proposed to enforce the intermediate domain to locate in the shortest geodesic path between source and target domains in the recipe embedding space.

Retrieval Unsupervised Domain Adaptation

Paper
Add Code

Adaptive Split-Fusion Transformer

1 code implementation • 26 Apr 2022 • Zixuan Su, Hao Zhang, Jingjing Chen, Lei Pang, Chong-Wah Ngo, Yu-Gang Jiang

Neural networks for visual content understanding have recently evolved from convolutional ones (CNNs) to transformers.

Ranked #1 on Image Classification on CIFAR-10 Image Classification

Image Classification

Paper
Code

Video Moment Retrieval from Text Queries via Single Frame Annotation

1 code implementation • 20 Apr 2022 • Ran Cui, Tianwen Qian, Pai Peng, Elena Daskalaki, Jingjing Chen, Xiaowei Guo, Huyang Sun, Yu-Gang Jiang

Weakly supervised methods only rely on the paired video and query, but the performance is relatively poor.

Contrastive Learning Moment Retrieval +1

Paper
Code

ObjectFormer for Image Manipulation Detection and Localization

no code implementations • CVPR 2022 • Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, Yu-Gang Jiang

Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives the research of image tampering detection.

Image Manipulation Image Manipulation Detection

Paper
Add Code

Wave-SAN: Wavelet based Style Augmentation Network for Cross-Domain Few-Shot Learning

1 code implementation • 15 Mar 2022 • Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, Yu-Gang Jiang

The key challenge of CD-FSL lies in the huge data shift between source and target domains, which is typically in the form of totally different visual styles.

Ranked #2 on Cross-Domain Few-Shot on CUB

cross-domain few-shot learning Self-Supervised Learning

Paper
Code

Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding

no code implementations • 10 Mar 2022 • Yang Jiao, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Recently, one-stage visual grounders attract high attention due to their comparable accuracy but significantly higher efficiency than two-stage grounders.

Object Visual Grounding

Paper
Add Code

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

1 code implementation • 10 Mar 2022 • Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart.

3D dense captioning Dense Captioning +3

Paper
Code

MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning

1 code implementation • 29 Jan 2022 • Zejun Li, Zhihao Fan, Huaixiao Tou, Jingjing Chen, Zhongyu Wei, Xuanjing Huang

In MVPTR, we follow the nested structure of both modalities to introduce concepts as high-level semantics.

Image-text matching Language Modelling +3

Paper
Code

Cross-Modal Transferable Adversarial Attacks from Images to Videos

no code implementations • CVPR 2022 • Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

This paper investigates the transferability of adversarial perturbation across different modalities, i. e., leveraging adversarial perturbation generated on white-box image models to attack black-box video models.

Video Recognition

Paper
Add Code

Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation

no code implementations • 10 Dec 2021 • Tianyi Liu, Zuxuan Wu, Wenhan Xiong, Jingjing Chen, Yu-Gang Jiang

Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model, and a feasible way to improve both tasks is to use more data.

Image-text matching Language Modelling +8

Paper
Add Code

BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval

2 code implementations • 29 Oct 2021 • Ning Han, Jingjing Chen, Chuhao Shi, Yawen Zeng, Guangyi Xiao, Hao Chen

The task of text-video retrieval aims to understand the correspondence between language and vision, has gained increasing attention in recent years.

Cross-Modal Retrieval Relation +3

Paper
Code

Attacking Video Recognition Models with Bullet-Screen Comments

1 code implementation • 29 Oct 2021 • Kai Chen, Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

On both UCF-101 and HMDB-51 datasets, our BSC attack method can achieve about 90\% fooling rate when attacking three mainstream video recognition models, while only occluding \textless 8\% areas in the video.

Adversarial Attack Adversarial Attack on Video Classification +2

Paper
Code

Boosting the Transferability of Video Adversarial Examples via Temporal Translation

1 code implementation • 18 Oct 2021 • Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

To this end, we propose to boost the transferability of video adversarial examples for black-box attacks on video recognition models.

Adversarial Attack Translation +1

Paper
Code

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

1 code implementation • 9 Oct 2021 • Yang Jiao, Zequn Jie, Weixin Luo, Jingjing Chen, Yu-Gang Jiang, Xiaolin Wei, Lin Ma

Referring Image Segmentation (RIS) aims at segmenting the target object from an image referred by one given natural language expression.

Image Segmentation Retrieval +2

Paper
Code

Self-supervised Learning for Semi-supervised Temporal Language Grounding

no code implementations • 23 Sep 2021 • Fan Luo, Shaoxiang Chen, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

Given a text description, Temporal Language Grounding (TLG) aims to localize temporal boundaries of the segments that contain the specified semantics in an untrimmed video.

Contrastive Learning Pseudo Label +2

Paper
Add Code

Towards Transferable Adversarial Attacks on Vision Transformers

2 code implementations • 9 Sep 2021 • Zhipeng Wei, Jingjing Chen, Micah Goldblum, Zuxuan Wu, Tom Goldstein, Yu-Gang Jiang

We evaluate the transferability of attacks on state-of-the-art ViTs, CNNs and robustly trained CNNs.

145

Paper
Code

Cross-domain Contrastive Learning for Unsupervised Domain Adaptation

1 code implementation • 10 Jun 2021 • Rui Wang, Zuxuan Wu, Zejia Weng, Jingjing Chen, Guo-Jun Qi, Yu-Gang Jiang

Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a fully-labeled source domain to a different unlabeled target domain.

Clustering Contrastive Learning +3

Paper
Code

VideoLT: Large-scale Long-tailed Video Recognition

1 code implementation • ICCV 2021 • Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang Jiang, Larry Davis

In this paper, we introduce VideoLT, a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition.

Image Classification Video Recognition

Paper
Code

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

1 code implementation • 20 Apr 2021 • Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Ser-Nam Lim, Yu-Gang Jiang

The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images.

DeepFake Detection Face Swapping +1

Paper
Code

HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition

no code implementations • 20 Apr 2021 • Zejia Weng, Zuxuan Wu, Hengduo Li, Jingjing Chen, Yu-Gang Jiang

Conventional video recognition pipelines typically fuse multimodal features for improved performance.

Video Recognition

Paper
Add Code

WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection

1 code implementation • 5 Jan 2021 • Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, Yu-Gang Jiang

WildDeepfake is a small dataset that can be used, in addition to existing datasets, to develop and test the effectiveness of deepfake detectors against real-world deepfakes.

DeepFake Detection Face Swapping

134

Paper
Code

Colonoscopy Polyp Detection: Domain Adaptation From Medical Report Images to Real-time Videos

no code implementations • 31 Dec 2020 • Zhi-Qin Zhan, Huazhu Fu, Yan-Yao Yang, Jingjing Chen, Jie Liu, Yu-Gang Jiang

However, there are several issues between the image-based training and video-based inference, including domain differences, lack of positive samples, and temporal smoothness.

Domain Adaptation

Paper
Add Code

Multi-modal Cooking Workflow Construction for Food Recipes

no code implementations • 20 Aug 2020 • Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yu-Gang Jiang, Tat-Seng Chua

Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe.

Common Sense Reasoning Decoder

Paper
Add Code

Dual In-painting Model for Unsupervised Gaze Correction and Animation in the Wild

1 code implementation • 9 Aug 2020 • Jichao Zhang, Jingjing Chen, Hao Tang, Wei Wang, Yan Yan, Enver Sangineto, Nicu Sebe

In this paper we address the problem of unsupervised gaze correction in the wild, presenting a solution that works without the need for precise annotations of the gaze angle and the head pose.

149

Paper
Code

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

5 code implementations • Interspeech 2020 • Jingjing Chen, Qirong Mao, Dong Liu

By introduces a improved transformer, elements in speech sequences can interact directly, which enables DPTNet can model for the speech sequences with direct context-awareness.

Ranked #15 on Speech Separation on WSJ0-2mix

Speech Separation Audio and Speech Processing Sound

2,123

Paper
Code

Coarse-to-Fine Gaze Redirection with Numerical and Pictorial Guidance

1 code implementation • 7 Apr 2020 • Jingjing Chen, Jichao Zhang, Enver Sangineto, Jiayuan Fan, Tao Chen, Nicu Sebe

In this paper, we propose to alleviate these problems by means of a novel gaze redirection framework which exploits both a numerical and a pictorial direction guidance, jointly with a coarse-to-fine learning strategy.

gaze redirection Image Generation

Paper
Code

Clean-Label Backdoor Attacks on Video Recognition Models

1 code implementation • CVPR 2020 • Shihao Zhao, Xingjun Ma, Xiang Zheng, James Bailey, Jingjing Chen, Yu-Gang Jiang

We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models, a situation where backdoor attacks are likely to be challenged by the above 4 strict conditions.

Backdoor Attack backdoor defense +2

Paper
Code

Heuristic Black-box Adversarial Attacks on Video Recognition Models

1 code implementation • 21 Nov 2019 • Zhipeng Wei, Jingjing Chen, Xingxing Wei, Linxi Jiang, Tat-Seng Chua, Fengfeng Zhou, Yu-Gang Jiang

To overcome this challenge, we propose a heuristic black-box attack model that generates adversarial perturbations only on the selected frames and regions.

Adversarial Attack Video Recognition

Paper
Code

GazeCorrection:Self-Guided Eye Manipulation in the wild using Self-Supervised Generative Adversarial Networks

no code implementations • arXiv 2019 • Jichao Zhang, Meng Sun, Jingjing Chen, Hao Tang, Yan Yan, Xueying Qin, Nicu Sebe

Gaze correction aims to redirect the person's gaze into the camera by manipulating the eye region, and it can be considered as a specific image resynthesis problem.

Resynthesis

Paper
Add Code

Probabilistic Forecasting of the Masses and Radii of Other Worlds

5 code implementations • 29 Mar 2016 • Jingjing Chen, David M. Kipping

By conditioning our model upon a sample spanning dwarf planets to late-type stars, Forecaster can predict the mass (or radius) from the radius (or mass) for objects covering nine orders-of-magnitude in mass.

Earth and Planetary Astrophysics Instrumentation and Methods for Astrophysics

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.