Search Results for author: Kai Chen

Found 229 papers, 110 papers with code

UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement

no code implementations • 22 Apr 2024 • Yaofeng Xie, Lingwei Kong, Kai Chen, Ziqiang Zheng, Xiao Yu, Zhibin Yu, Bing Zheng

Learning-based underwater image enhancement (UIE) methods have made great progress.

Paper
Add Code

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

no code implementations • 16 Apr 2024 • Yanze Li, Wenhua Zhang, Kai Chen, Yanxin Liu, Pengxiang Li, Ruiyuan Gao, Lanqing Hong, Meng Tian, Xinhai Zhao, Zhenguo Li, Dit-yan Yeung, Huchuan Lu, Xu Jia

Large Vision-Language Models (LVLMs), due to the remarkable visual reasoning ability to understand images and videos, have received widespread attention in the autonomous driving domain, which significantly advances the development of interpretable end-to-end autonomous driving.

Autonomous Driving Visual Reasoning

Paper
Add Code

Adapting LLaMA Decoder to Vision Transformer

no code implementations • 10 Apr 2024 • Jiahao Wang, Wenqi Shao, Mengzhao Chen, Chengyue Wu, Yong liu, Kaipeng Zhang, Songyang Zhang, Kai Chen, Ping Luo

We first "LLaMAfy" a standard ViT step-by-step to align with LLaMA's architecture, and find that directly applying a casual mask to the self-attention brings an attention collapse issue, resulting in the failure to the network training.

Computational Efficiency Quantization +1

Paper
Add Code

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

2 code implementations • 9 Apr 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang

The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution.

Ranked #11 on Visual Question Answering on MM-Vet

4k Language Modelling +1

1,622

Paper
Code

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks

1 code implementation • 9 Apr 2024 • Chonghua Wang, Haodong Duan, Songyang Zhang, Dahua Lin, Kai Chen

Recently, the large language model (LLM) community has shown increasing interest in enhancing LLMs' capability to handle extremely long documents.

Answer Selection Long-Context Understanding

Paper
Code

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

no code implementations • 1 Apr 2024 • Rongjie Li, Songyang Zhang, Dahua Lin, Kai Chen, Xuming He

Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks.

Graph Generation Relation +2

Paper
Add Code

InternLM2 Technical Report

1 code implementation • 26 Mar 2024 • Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang, Penglong Jiao, Zhenjiang Jin, Zhikai Lei, Jiaxing Li, Jingwen Li, Linyang Li, Shuaibin Li, Wei Li, Yining Li, Hongwei Liu, Jiangning Liu, Jiawei Hong, Kaiwen Liu, Kuikun Liu, Xiaoran Liu, Chengqi Lv, Haijun Lv, Kai Lv, Li Ma, Runyuan Ma, Zerun Ma, Wenchang Ning, Linke Ouyang, Jiantao Qiu, Yuan Qu, FuKai Shang, Yunfan Shao, Demin Song, Zifan Song, Zhihao Sui, Peng Sun, Yu Sun, Huanze Tang, Bin Wang, Guoteng Wang, Jiaqi Wang, Jiayu Wang, Rui Wang, Yudong Wang, Ziyi Wang, Xingjian Wei, Qizhen Weng, Fan Wu, Yingtong Xiong, Chao Xu, Ruiliang Xu, Hang Yan, Yirong Yan, Xiaogui Yang, Haochen Ye, Huaiyuan Ying, JIA YU, Jing Yu, Yuhang Zang, Chuyu Zhang, Li Zhang, Pan Zhang, Peng Zhang, Ruijie Zhang, Shuo Zhang, Songyang Zhang, Wenjian Zhang, Wenwei Zhang, Xingcheng Zhang, Xinyue Zhang, Hui Zhao, Qian Zhao, Xiaomeng Zhao, Fengzhe Zhou, Zaida Zhou, Jingming Zhuo, Yicheng Zou, Xipeng Qiu, Yu Qiao, Dahua Lin

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).

Ranked #5 on Long-Context Understanding on Ada-LEval (BestAnswer)

4k Long-Context Understanding

5,180

Paper
Code

Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text

no code implementations • 25 Mar 2024 • Junshu Tang, Yanhong Zeng, Ke Fan, Xuheng Wang, Bo Dai, Kai Chen, Lizhuang Ma

Creating and animating 3D biped cartoon characters is crucial and valuable in various applications.

Question Answering Texture Synthesis

Paper
Add Code

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

1 code implementation • 25 Mar 2024 • Lingdong Kong, Xiang Xu, Jun Cen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu

Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models.

Data Augmentation Scene Understanding

Paper
Code

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

no code implementations • 20 Mar 2024 • Yibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-yan Yeung, Qiang Xu, Kai Zhang

Furthermore, image syntheses from DetDiffusion can effectively augment training data, significantly enhancing downstream detection performance.

Attribute Data Augmentation +3

Paper
Add Code

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

1 code implementation • 19 Mar 2024 • Zehui Chen, Kuikun Liu, Qiuchen Wang, Wenwei Zhang, Jiangning Liu, Dahua Lin, Kai Chen, Feng Zhao

Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents.

Hallucination

194

Paper
Code

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

no code implementations • 14 Mar 2024 • Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung, James T. Kwok, Yu Zhang

Multimodal large language models (MLLMs) have shown impressive reasoning abilities, which, however, are also more vulnerable to jailbreak attacks than their LLM predecessors.

Optical Character Recognition (OCR)

Paper
Add Code

DevBench: A Comprehensive Benchmark for Software Development

1 code implementation • 13 Mar 2024 • Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen

Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities.

Code Generation

Paper
Code

Towards Fair and Efficient Learning-based Congestion Control

no code implementations • 4 Mar 2024 • Xudong Liao, Han Tian, Chaoliang Zeng, Xinchen Wan, Kai Chen

We present Astraea, a new learning-based congestion control that ensures fast convergence to fairness with stability.

Fairness

Paper
Add Code

Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction

no code implementations • 28 Feb 2024 • Tong Liu, Yingjie Zhang, Zhe Zhao, Yinpeng Dong, Guozhu Meng, Kai Chen

We evaluate DRA across various open-source and close-source models, showcasing state-of-the-art jailbreak success rates and attack efficiency.

Reconstruction Attack

Paper
Add Code

Foot In The Door: Understanding Large Language Model Jailbreaking via Cognitive Psychology

no code implementations • 24 Feb 2024 • Zhenhua Wang, Wei Xie, Baosheng Wang, Enze Wang, Zhiwen Gui, Shuoyoucheng Ma, Kai Chen

Our research provides a psychological explanation of the jailbreak prompts.

Decision Making Language Modelling +1

Paper
Add Code

CriticBench: Evaluating Large Language Models as Critic

1 code implementation • 21 Feb 2024 • Tian Lan, Wenwei Zhang, Chen Xu, Heyan Huang, Dahua Lin, Kai Chen, Xian-Ling Mao

Critique ability are crucial in the scalable oversight and self-improvement of Large Language Models (LLMs).

Paper
Code

How Susceptible are Large Language Models to Ideological Manipulation?

1 code implementation • 18 Feb 2024 • Kai Chen, Zihao He, Jun Yan, Taiwei Shi, Kristina Lerman

Large Language Models (LLMs) possess the potential to exert substantial influence on public perceptions and interactions with information.

Paper
Code

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

1 code implementation • 9 Feb 2024 • Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin

We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math.

Data Augmentation GSM8K +3

190

Paper
Code

Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts

no code implementations • 8 Feb 2024 • Zhili Liu, Kai Chen, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, James T. Kwok

It also obtains new state-of-the-art self-supervised learning results on detection and segmentation.

Self-Supervised Learning

Paper
Add Code

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

1 code implementation • 29 Jan 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form text-image composition and comprehension.

Ranked #16 on Visual Question Answering on MM-Vet

Language Modelling Visual Question Answering

1,622

Paper
Code

MEA-Defender: A Robust Watermark against Model Extraction Attack

1 code implementation • 26 Jan 2024 • Peizhuo Lv, Hualong Ma, Kai Chen, Jiachen Zhou, Shengzhi Zhang, Ruigang Liang, Shenchen Zhu, Pan Li, Yingjun Zhang

To protect the Intellectual Property (IP) of the original owners over such DNN models, backdoor-based watermarks have been extensively studied.

Model extraction Self-Supervised Learning

Paper
Code

Can AI Assistants Know What They Don't Know?

1 code implementation • 24 Jan 2024 • Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu, Wenwei Zhang, Zhangyue Yin, ShiMin Li, Linyang Li, Zhengfu He, Kai Chen, Xipeng Qiu

To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets.

Math Open-Domain Question Answering +1

Paper
Code

Guided Diffusion for Fast Inverse Design of Density-based Mechanical Metamaterials

no code implementations • 24 Jan 2024 • Yanyan Yang, Lili Wang, Xiaoya Zhai, Kai Chen, WenMing Wu, Yunkai Zhao, Ligang Liu, Xiao-Ming Fu

Mechanical metamaterial is a synthetic material that can possess extraordinary physical characteristics, such as abnormal elasticity, stiffness, and stability, by carefully designing its internal structure.

Paper
Add Code

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

no code implementations • 18 Jan 2024 • Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy

We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process.

Video Inpainting

Paper
Add Code

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

1 code implementation • 18 Jan 2024 • Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan Yang

Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation.

Interactive Segmentation Panoptic Segmentation +3

187

Paper
Code

OMG-Seg: Is One Model Good Enough For All Segmentation?

1 code implementation • 18 Jan 2024 • Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy

In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models.

Interactive Segmentation Panoptic Segmentation +3

681

Paper
Code

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

1 code implementation • 16 Jan 2024 • Huanjun Kong, Songyang Zhang, Jiaying Li, Min Xiao, Jun Xu, Kai Chen

In this work, we present HuixiangDou, a technical assistant powered by Large Language Models (LLM).

In-Context Learning

803

Paper
Code

STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering

1 code implementation • 8 Jan 2024 • Yueqian Wang, Yuxuan Wang, Kai Chen, Dongyan Zhao

However, most models can only handle simple videos in terms of temporal reasoning, and their performance tends to drop when answering temporal-reasoning questions on long and informative videos.

Question Answering Video Question Answering

Paper
Code

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

1 code implementation • 5 Jan 2024 • Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, Chen Change Loy

The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models (VFMs).

Image Classification Interactive Segmentation +3

585

Paper
Code

Any-point Trajectory Modeling for Policy Learning

no code implementations • 28 Dec 2023 • Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter Abbeel

Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning.

Trajectory Modeling Transfer Learning

Paper
Add Code

LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner States Analysis

no code implementations • 27 Dec 2023 • Jinwen He, Yujia Gong, Kai Chen, Zijin Lin, Chengan Wei, Yue Zhao

In this paper, we introduce the LLM factoscope, a novel Siamese network-based model that leverages the inner states of LLMs for factual detection.

Paper
Add Code

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

1 code implementation • 26 Dec 2023 • Tai Wang, Xiaohan Mao, Chenming Zhu, Runsen Xu, Ruiyuan Lyu, Peisen Li, Xiao Chen, Wenwei Zhang, Kai Chen, Tianfan Xue, Xihui Liu, Cewu Lu, Dahua Lin, Jiangmiao Pang

In the realm of computer vision and robotics, embodied agents are expected to explore their environment and carry out human instructions.

Scene Understanding

300

Paper
Code

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

1 code implementation • 21 Dec 2023 • Yiming Zhang, Zhening Xing, Yanhong Zeng, Youqing Fang, Kai Chen

Recent advancements in personalized text-to-image (T2I) models have revolutionized content creation, empowering non-experts to generate stunning images with unique styles.

Image Animation

719

Paper
Code

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step

1 code implementation • 21 Dec 2023 • Zehui Chen, Weihua Du, Wenwei Zhang, Kuikun Liu, Jiangning Liu, Miao Zheng, Jingming Zhuo, Songyang Zhang, Dahua Lin, Kai Chen, Feng Zhao

Based on that, we further introduce T-Eval to evaluate the tool utilization capability step by step.

Instruction Following Retrieval

150

Paper
Code

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

no code implementations • 19 Dec 2023 • Yunhao Gou, Zhili Liu, Kai Chen, Lanqing Hong, Hang Xu, Aoxue Li, Dit-yan Yeung, James T. Kwok, Yu Zhang

Instruction tuning of Large Vision-language Models (LVLMs) has revolutionized the development of versatile models with zero-shot generalization across a wide range of downstream vision-language tasks.

Instruction Following Zero-shot Generalization

Paper
Add Code

DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models

1 code implementation • 18 Dec 2023 • Jiachen Zhou, Peizhuo Lv, Yibing Lan, Guozhu Meng, Kai Chen, Hualong Ma

Dataset sanitization is a widely adopted proactive defense against poisoning-based backdoor attacks, aimed at filtering out and removing poisoned samples from training datasets.

Paper
Code

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

1 code implementation • 12 Dec 2023 • Peng Lu, Tao Jiang, Yining Li, Xiangtai Li, Kai Chen, Wenming Yang

Real-time multi-person pose estimation presents significant challenges in balancing speed and precision.

Ranked #1 on Multi-Person Pose Estimation on CrowdPose (using extra training data)

Multi-Person Pose Estimation

4,986

Paper
Code

Mixed Pseudo Labels for Semi-Supervised Object Detection

1 code implementation • 12 Dec 2023 • Zeming Chen, Wenwei Zhang, Xinjiang Wang, Kai Chen, Zhi Wang

While the pseudo-label method has demonstrated considerable success in semi-supervised object detection tasks, this paper uncovers notable limitations within this approach.

Ranked #1 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)

Object object-detection +3

Paper
Code

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

1 code implementation • 6 Dec 2023 • Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, Kai Chen

This enables PowerPaint to accomplish various inpainting tasks by utilizing different task prompts, resulting in state-of-the-art performance.

Image Inpainting Object

6,571

Paper
Code

TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models

no code implementations • 1 Dec 2023 • Pengxiang Li, Kai Chen, Zhili Liu, Ruiyuan Gao, Lanqing Hong, Guo Zhou, Hua Yao, Dit-yan Yeung, Huchuan Lu, Xu Jia

Despite remarkable achievements in video synthesis, achieving granular control over complex dynamics, such as nuanced movement among multiple interacting objects, still presents a significant hurdle for dynamic world modeling, compounded by the necessity to manage appearance and disappearance, drastic scale changes, and ensure consistency for instances across frames.

Image Classification Multi-Object Tracking +4

Paper
Add Code

Safer-Instruct: Aligning Language Models with Automated Preference Data

1 code implementation • 15 Nov 2023 • Taiwei Shi, Kai Chen, Jieyu Zhao

To verify the effectiveness of Safer-Instruct, we apply the pipeline to construct a safety preference dataset as a case study.

Paper
Code

SAMIHS: Adaptation of Segment Anything Model for Intracranial Hemorrhage Segmentation

1 code implementation • 14 Nov 2023 • Yinuo Wang, Kai Chen, Weimin Yuan, Cai Meng, Xiangzhi Bai

Segment Anything Model (SAM), a vision foundation model trained on large-scale annotations, has recently continued raising awareness within medical image segmentation.

Image Segmentation Medical Image Segmentation +2

Paper
Code

BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues

1 code implementation • 20 Oct 2023 • Haodong Duan, Jueqi Wei, Chonghua Wang, Hongwei Liu, Yixiao Fang, Songyang Zhang, Dahua Lin, Kai Chen

In contrast, other LLMs struggle to generate multi-turn dialogues of satisfactory quality due to poor instruction-following capability, tendency to generate lengthy utterances, or limited general capability.

Instruction Following

Paper
Code

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

no code implementations • 16 Oct 2023 • Kai Chen, Chunwei Wang, Kuo Yang, Jianhua Han, Lanqing Hong, Fei Mi, Hang Xu, Zhengying Liu, Wenyong Huang, Zhenguo Li, Dit-yan Yeung, Lifeng Shang, Xin Jiang, Qun Liu

The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges.

Instruction Following

Paper
Add Code

Implicit Concept Removal of Diffusion Models

no code implementations • 9 Oct 2023 • Zhili Liu, Kai Chen, Yifan Zhang, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung, James Kwok

To address this, we utilize the intrinsic geometric characteristics of implicit concepts and present the Geom-Erasing, a novel concept removal method based on geometric-driven control.

Paper
Add Code

Evaluating Hallucinations in Chinese Large Language Models

2 code implementations • 5 Oct 2023 • Qinyuan Cheng, Tianxiang Sun, Wenwei Zhang, Siyin Wang, Xiangyang Liu, Mozhi Zhang, Junliang He, Mianqiu Huang, Zhangyue Yin, Kai Chen, Xipeng Qiu

We analyze the primary types of hallucinations in different types of models and their causes.

Hallucination Question Answering

504

Paper
Code

MagicDrive: Street View Generation with Diverse 3D Geometry Control

no code implementations • 4 Oct 2023 • Ruiyuan Gao, Kai Chen, Enze Xie, Lanqing Hong, Zhenguo Li, Dit-yan Yeung, Qiang Xu

Recent advancements in diffusion models have significantly enhanced the data synthesis with 2D control.

3D Object Detection Object +1

Paper
Add Code

LawBench: Benchmarking Legal Knowledge of Large Language Models

1 code implementation • 28 Sep 2023 • Zhiwei Fei, Xiaoyu Shen, Dawei Zhu, Fengzhe Zhou, Zhuo Han, Songyang Zhang, Kai Chen, Zongwen Shen, Jidong Ge

We hope this benchmark provides in-depth understanding of the LLMs' domain-specified capabilities and speed up the development of LLMs in the legal domain.

Benchmarking Memorization +1

162

Paper
Code

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

1 code implementation • 26 Sep 2023 • Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Haodong Duan, Songyang Zhang, Shuangrui Ding, Wenwei Zhang, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

We propose InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition.

Ranked #9 on Visual Question Answering (VQA) on InfiMM-Eval

Image Comprehension Reading Comprehension +1

1,622

Paper
Code

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection

no code implementations • 18 Sep 2023 • Chenming Zhu, Wenwei Zhang, Tai Wang, Xihui Liu, Kai Chen

Instead of leveraging 2D images, we propose Object2Scene, the first approach that leverages large-scale large-vocabulary 3D object datasets to augment existing 3D scene datasets for open-vocabulary 3D object detection.

Ranked #2 on 3D Open-Vocabulary Object Detection on ScanNet on unseen classes

3D Object Detection 3D Open-Vocabulary Object Detection +4

Paper
Add Code

Good-looking but Lacking Faithfulness: Understanding Local Explanation Methods through Trend-based Testing

1 code implementation • 9 Sep 2023 • Jinwen He, Kai Chen, Guozhu Meng, Jiangshan Zhang, Congyi Li

While enjoying the great achievements brought by deep learning (DL), people are also worried about the decision made by DL models, since the high degree of non-linearity of DL models makes the decision extremely difficult to understand.

Paper
Code

Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip Segmentation in Robotic Surgeries

no code implementations • 2 Sep 2023 • Jiaqi Liu, Yonghao Long, Kai Chen, Cheuk Hei Leung, Zerui Wang, Qi Dou

However, this task is very challenging due to the small sizes of surgical instrument tips, and significant variance of surgical scenes across different procedures.

Graph Learning Segmentation

Paper
Add Code

A Survey for Federated Learning Evaluations: Goals and Measures

no code implementations • 23 Aug 2023 • Di Chai, Leye Wang, Liu Yang, Junxue Zhang, Kai Chen, Qiang Yang

Evaluation is a systematic approach to assessing how well a system achieves its intended purpose.

Federated Learning Privacy Preserving

Paper
Add Code

Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models

no code implementations • 16 Aug 2023 • Zhenhua Wang, Wei Xie, Kai Chen, Baosheng Wang, Zhiwen Gui, Enze Wang

Inspired by the attack that penetrates traditional firewalls through reverse tunnels, we introduce a "self-deception" attack that can bypass the semantic firewall by inducing LLM to generate prompts that facilitate jailbreak.

Paper
Add Code

Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation

1 code implementation • ICCV 2023 • Jun Zhou, Kai Chen, Linlin Xu, Qi Dou, Jing Qin

One critical challenge in 6D object pose estimation from a single RGBD image is efficient integration of two different modalities, i. e., color and depth.

6D Pose Estimation using RGB Semantic Similarity +1

Paper
Code

Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for Super-Resolution

1 code implementation • 5 Aug 2023 • Yong liu, Hang Dong, Boyang Liang, Songwei Liu, Qingji Dong, Kai Chen, Fangmin Chen, Lean Fu, Fei Wang

Since the high resolution of intermediate features in SISR models increases memory and computational requirements, efficient SISR transformers are more favored.

Image Super-Resolution

Paper
Code

Transferable Graph Structure Learning for Graph-based Traffic Forecasting Across Cities

1 code implementation • KDD 2023 • Yilun Jin, Kai Chen, Qiang Yang

To address the problem, we propose TransGTR, a transferable structure learning framework for traffic forecasting that jointly learns and transfers the graph structures and forecasting models across cities.

Graph structure learning Knowledge Distillation +1

Paper
Code

Learning Referring Video Object Segmentation from Weak Annotation

no code implementations • 4 Aug 2023 • Wangbo Zhao, Kepan Nan, Songyang Zhang, Kai Chen, Dahua Lin, Yang You

Based on this scheme, we develop a novel RVOS method that exploits weak annotations effectively.

Contrastive Learning Object +5

Paper
Add Code

Relational Contrastive Learning for Scene Text Recognition

1 code implementation • 1 Aug 2023 • Jinglei Zhang, Tiancheng Lin, Yi Xu, Kai Chen, Rui Zhang

We argue that such prior contextual information can be interpreted as the relations of textual primitives due to the heterogeneous text and background, which can provide effective self-supervised labels for representation learning.

Contrastive Learning Representation Learning +1

Paper
Code

Improving Pixel-based MIM by Reducing Wasted Modeling Capability

1 code implementation • ICCV 2023 • YuAn Liu, Songyang Zhang, Jiacheng Chen, Zhaohui Yu, Kai Chen, Dahua Lin

There has been significant progress in Masked Image Modeling (MIM).

Semantic Segmentation

3,156

Paper
Code

Value-Informed Skill Chaining for Policy Learning of Long-Horizon Tasks with Surgical Robot

1 code implementation • 31 Jul 2023 • Tao Huang, Kai Chen, Wang Wei, Jianan Li, Yonghao Long, Qi Dou

Based on this value function, a chaining policy is learned to instruct subtask policies to terminate at the state with the highest value so that all subsequent policies are more likely to be connected for accomplishing the task.

reinforcement-learning

Paper
Code

MMBench: Is Your Multi-modal Model an All-around Player?

2 code implementations • 12 Jul 2023 • YuAn Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin

In response to these challenges, we propose MMBench, a novel multi-modality benchmark.

Ranked #1 on Visual Question Answering on MMBench

Visual Question Answering

2,494

Paper
Code

ConFL: Constraint-guided Fuzzing for Machine Learning Framework

no code implementations • 11 Jul 2023 • Zhao Liu, Quanchen Zou, Tian Yu, Xuan Wang, Guozhu Meng, Kai Chen, Deyue Zhang

Guided by the constraints, ConFL is able to generate valid inputs that can pass the verification and explore deeper paths of kernel codes.

Decision Making valid

Paper
Add Code

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

2 code implementations • 7 Jul 2023 • Shilong Zhang, Peize Sun, Shoufa Chen, Min Xiao, Wenqi Shao, Wenwei Zhang, Yu Liu, Kai Chen, Ping Luo

Before sending to LLM, the reference is replaced by RoI features and interleaved with language embeddings as a sequence.

Ranked #1 on Visual Question Answering (VQA) on VCR (Q-AR) test

Attribute Common Sense Reasoning +4

453

Paper
Code

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

2 code implementations • NeurIPS 2023 • Youquan Liu, Lingdong Kong, Jun Cen, Runnan Chen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu

Recent advancements in vision foundation models (VFMs) have opened up new possibilities for versatile and efficient visual perception.

Representation Learning Transfer Learning

495

Paper
Code

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation

no code implementations • 7 Jun 2023 • Kai Chen, Enze Xie, Zhe Chen, Yibo Wang, Lanqing Hong, Zhenguo Li, Dit-yan Yeung

Diffusion models have attracted significant attention due to the remarkable ability to create content and generate data for tasks like image classification.

Image Classification Layout-to-Image Generation +2

Paper
Add Code

Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model

no code implementations • 31 May 2023 • Haisong Ding, Bozhi Luan, Dongnan Gui, Kai Chen, Qiang Huo

This model conditions on a printed glyph image and creates mappings between printed characters and handwritten images, thus enabling the generation of photo-realistic handwritten samples with diverse styles and unseen text contents.

Denoising Optical Character Recognition (OCR)

Paper
Add Code

GlyphControl: Glyph Conditional Control for Visual Text Generation

1 code implementation • NeurIPS 2023 • Yukang Yang, Dongnan Gui, Yuhui Yuan, Weicong Liang, Haisong Ding, Han Hu, Kai Chen

We evaluate the effectiveness of our approach by measuring OCR-based metrics, CLIP score, and FID of the generated visual text.

Optical Character Recognition (OCR) Text Generation

173

Paper
Code

A Meta-learning Framework for Tuning Parameters of Protection Mechanisms in Trustworthy Federated Learning

no code implementations • 28 May 2023 • Xiaojin Zhang, Yan Kang, Lixin Fan, Kai Chen, Qiang Yang

Motivated by this requirement, we propose a framework that (1) formulates TFL as a problem of finding a protection mechanism to optimize the tradeoff between privacy leakage, utility loss, and efficiency reduction and (2) formally defines bounded measurements of the three factors.

Federated Learning Meta-Learning

Paper
Add Code

RankCSE: Unsupervised Sentence Representations Learning via Learning to Rank

1 code implementation • 26 May 2023 • Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Wei Wu, Yunsen Xian, Dongyan Zhao, Kai Chen, Rui Yan

In this paper, we propose a novel approach, RankCSE, for unsupervised sentence representation learning, which incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework.

Contrastive Learning Learning-To-Rank +4

Paper
Code

Zero-shot Generation of Training Data with Denoising Diffusion Probabilistic Model for Handwritten Chinese Character Recognition

no code implementations • 25 May 2023 • Dongnan Gui, Kai Chen, Haisong Ding, Qiang Huo

Training from handwritten samples of a small character set, the DDPM is capable of mapping printed strokes to handwritten ones, which makes it possible to generate photo-realistic and diverse style handwritten samples of unseen character categories.

Denoising

Paper
Add Code

Theoretically Principled Federated Learning for Balancing Privacy and Utility

no code implementations • 24 May 2023 • Xiaojin Zhang, Wenjie Li, Kai Chen, Shutao Xia, Qiang Yang

We propose a general learning framework for the protection mechanisms that protects privacy via distorting model parameters, which facilitates the trade-off between privacy and utility.

Federated Learning

Paper
Add Code

TG-VQA: Ternary Game of Video Question Answering

no code implementations • 17 May 2023 • Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen

Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them.

Contrastive Learning Question Answering +2

Paper
Add Code

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

1 code implementation • 8 May 2023 • Tao Gong, Chengqi Lyu, Shilong Zhang, Yudong Wang, Miao Zheng, Qian Zhao, Kuikun Liu, Wenwei Zhang, Ping Luo, Kai Chen

To further enhance the ability to chat with humans of the MultiModal-GPT, we utilize language-only instruction-following data to train the MultiModal-GPT jointly.

Instruction Following Language Modelling

1,402

Paper
Code

Towards Achieving Near-optimal Utility for Privacy-Preserving Federated Learning via Data Generation and Parameter Distortion

no code implementations • 7 May 2023 • Xiaojin Zhang, Kai Chen, Qiang Yang

The nature of the widely-adopted protection mechanisms including \textit{Randomization Mechanism} and \textit{Compression Mechanism} is to protect privacy via distorting model parameter.

Federated Learning Privacy Preserving

Paper
Add Code

Optimizing Privacy, Utility and Efficiency in Constrained Multi-Objective Federated Learning

no code implementations • 29 Apr 2023 • Yan Kang, Hanlin Gu, Xingxing Tang, Yuanqin He, Yuzhu Zhang, Jinnan He, Yuxing Han, Lixin Fan, Kai Chen, Qiang Yang

Different from existing CMOFL works focusing on utility, efficiency, fairness, and robustness, we consider optimizing privacy leakage along with utility loss and training cost, the three primary objectives of a TFL system.

Fairness Federated Learning

Paper
Add Code

Transformer-Based Visual Segmentation: A Survey

2 code implementations • 19 Apr 2023 • Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy

Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks.

Autonomous Driving Point Cloud Segmentation +1

574

Paper
Code

RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions

1 code implementation • 13 Apr 2023 • Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu

Our experiments further demonstrate that pre-training and depth-free BEV transformation has the potential to enhance out-of-distribution robustness.

Robust Camera Only 3D Object Detection

284

Paper
Code

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

2 code implementations • 12 Apr 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

3,156

Paper
Code

A Game-theoretic Framework for Privacy-preserving Federated Learning

no code implementations • 11 Apr 2023 • Xiaojin Zhang, Lixin Fan, Siwei Wang, Wenjie Li, Kai Chen, Qiang Yang

To address this, we propose the first game-theoretic framework that considers both FL defenders and attackers in terms of their respective payoffs, which include computational costs, FL model utilities, and privacy leakage risks.

Federated Learning Privacy Preserving

Paper
Add Code

Probably Approximately Correct Federated Learning

no code implementations • 10 Apr 2023 • Xiaojin Zhang, Anbu Huang, Lixin Fan, Kai Chen, Qiang Yang

However, existing multi-objective optimization frameworks are very time-consuming, and do not guarantee the existence of the Pareto frontier, this motivates us to seek a solution to transform the multi-objective problem into a single-objective problem because it is more efficient and easier to be solved.

Federated Learning PAC learning

Paper
Add Code

A Survey on Vertical Federated Learning: From a Layered Perspective

no code implementations • 4 Apr 2023 • Liu Yang, Di Chai, Junxue Zhang, Yilun Jin, Leye Wang, Hao liu, Han Tian, Qian Xu, Kai Chen

From the hardware layer to the vertical federated system layer, researchers contribute to various aspects of VFL.

Privacy Preserving Vertical Federated Learning

Paper
Add Code

Mixed Autoencoder for Self-supervised Visual Representation Learning

1 code implementation • CVPR 2023 • Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung

Specifically, our MixedAE outperforms MAE by +0. 3% accuracy, +1. 7 mIoU and +0. 9 AP on ImageNet-1K, ADE20K and COCO respectively with a standard ViT-Base.

Contrastive Learning Data Augmentation +1

Paper
Code

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions

1 code implementation • ICCV 2023 • Lingdong Kong, Youquan Liu, Xin Li, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu

The robustness of 3D perception systems under natural corruptions from environments and sensors is pivotal for safety-critical applications.

Robust 3D Object Detection Robust 3D Semantic Segmentation

272

Paper
Code

Federated Learning without Full Labels: A Survey

no code implementations • 25 Mar 2023 • Yilun Jin, Yang Liu, Kai Chen, Qiang Yang

Therefore, the problem of federated learning without full labels is important in real-world FL applications.

Federated Learning Self-Supervised Learning +1

Paper
Add Code

UMC: A Unified Bandwidth-efficient and Multi-resolution based Collaborative Perception Framework

no code implementations • ICCV 2023 • Tianhang Wang, Guang Chen, Kai Chen, Zhengfa Liu, Bo Zhang, Alois Knoll, Changjun Jiang

To verify our algorithm, we conducted experiments on the V2X-Sim and OPV2V datasets.

Paper
Add Code

Dense Distinct Query for End-to-End Object Detection

1 code implementation • CVPR 2023 • Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Chengqi Lyu, Wenwei Zhang, Ping Luo, Kai Chen

Concretely, we first lay dense queries like traditional detectors and then select distinct ones for one-to-one assignments.

Ranked #3 on Object Detection on CrowdHuman (full body)

Object object-detection +1

236

Paper
Code

RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose

1 code implementation • 13 Mar 2023 • Tao Jiang, Peng Lu, Li Zhang, Ningsheng Ma, Rui Han, Chengqi Lyu, Yining Li, Kai Chen

Recent studies on 2D pose estimation have achieved excellent performance on public benchmarks, yet its application in the industrial community still suffers from heavy model parameters and high latency.

Ranked #3 on Pose Estimation on OCHuman (using extra training data)

2D Human Pose Estimation 2D Pose Estimation +1

4,986

Paper
Code

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation

no code implementations • 11 Mar 2023 • Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, Philip H. S. Torr

Referring image segmentation segments an image from a language expression.

Image Segmentation Object +1

Paper
Add Code

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

1 code implementation • 4 Mar 2023 • YuAn Liu, Songyang Zhang, Jiacheng Chen, Kai Chen, Dahua Lin

Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.

Self-Supervised Learning

3,082

Paper
Code

Demonstration-Guided Reinforcement Learning with Efficient Exploration for Task Automation of Surgical Robot

2 code implementations • 20 Feb 2023 • Tao Huang, Kai Chen, Bin Li, Yun-hui Liu, Qi Dou

Task automation of surgical robot has the potentials to improve surgical efficiency.

Efficient Exploration reinforcement-learning +1

111

Paper
Code

Boosting Neural Networks to Decompile Optimized Binaries

no code implementations • 3 Jan 2023 • Ying Cao, Ruigang Liang, Kai Chen, Peiwei Hu

They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability.

Machine Translation Malware Analysis +2

Paper
Add Code

Learning Shape Primitives via Implicit Convexity Regularization

1 code implementation • ICCV 2023 • Xiaoyang Huang, Yi Zhang, Kai Chen, Teng Li, Wenjun Zhang, Bingbing Ni

In this work, a novel regularization term named Implicit Convexity Regularization (ICR) imposed on implicit primitive learning is proposed to tackle this problem.

Paper
Code

RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer

no code implementations • CVPR 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

Paper
Add Code

RTMDet: An Empirical Study of Designing Real-Time Object Detectors

9 code implementations • 14 Dec 2022 • Chengqi Lyu, Wenwei Zhang, Haian Huang, Yue Zhou, Yudong Wang, Yanyi Liu, Shilong Zhang, Kai Chen

In this paper, we aim to design an efficient real-time object detector that exceeds the YOLO series and is easily extensible for many object recognition tasks such as instance segmentation and rotated object detection.

Ranked #1 on Oriented Object Detection on DOTA 1.5

Object object-detection +7

27,779

Paper
Code

Anger Breeds Controversy: Analyzing Controversy and Emotions on Reddit

no code implementations • 1 Dec 2022 • Kai Chen, Zihao He, Rong-Ching Chang, Jonathan May, Kristina Lerman

We collect discussions from a wide variety of topical forums and use emotion detection to recognize a range of emotions from text, including anger, fear, joy, admiration, etc.

Paper
Add Code

StereoPose: Category-Level 6D Transparent Object Pose Estimation from Stereo Images via Back-View NOCS

no code implementations • 3 Nov 2022 • Kai Chen, Stephen James, Congying Sui, Yun-hui Liu, Pieter Abbeel, Qi Dou

To further improve the performance of the stereo framework, StereoPose is equipped with a parallax attention module for stereo feature fusion and an epipolar loss for improving the stereo-view consistency of network predictions.

Object Pose Estimation +1

Paper
Add Code

Boosting Point Clouds Rendering via Radiance Mapping

1 code implementation • 27 Oct 2022 • Xiaoyang Huang, Yi Zhang, Bingbing Ni, Teng Li, Kai Chen, Wenjun Zhang

In this work, we focus on boosting the image quality of point clouds rendering with a compact model design.

Paper
Code

A Novel Membership Inference Attack against Dynamic Neural Networks by Utilizing Policy Networks Information

no code implementations • 17 Oct 2022 • Pan Li, Peizhuo Lv, Shenchen Zhu, Ruigang Liang, Kai Chen

Although traditional static DNNs are vulnerable to the membership inference attack (MIA) , which aims to infer whether a particular point was used to train the model, little is known about how such an attack performs on the dynamic NNs.

Computational Efficiency Image Classification +2

Paper
Add Code

DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition

3 code implementations • 12 Oct 2022 • Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin

Graph convolution networks (GCN) have been widely used in skeleton-based action recognition.

Ranked #7 on Skeleton Based Action Recognition on NTU RGB+D

Action Recognition Skeleton Based Action Recognition

853

Paper
Code

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks

1 code implementation • 20 Sep 2022 • Haodong Duan, Yue Zhao, Kai Chen, Yuanjun Xiong, Dahua Lin

Deep learning models have achieved excellent recognition results on large-scale video benchmarks.

Action Recognition

Paper
Code

Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation

1 code implementation • 16 Sep 2022 • Lin Chen, Zhixiang Wei, Xin Jin, Huaian Chen, Miao Zheng, Kai Chen, Yi Jin

In this work, we resort to data mixing to establish a deliberated domain bridging (DDB) for DASS, through which the joint distributions of source and target domains are aligned and interacted with each in the intermediate space.

Ranked #1 on Domain Adaptation on GTAV+Synscapes to Cityscapes

Knowledge Distillation Semantic Segmentation +3

Paper
Code

SSL-WM: A Black-Box Watermarking Approach for Encoders Pre-trained by Self-supervised Learning

1 code implementation • 8 Sep 2022 • Peizhuo Lv, Pan Li, Shenchen Zhu, Shengzhi Zhang, Kai Chen, Ruigang Liang, Chang Yue, Fan Xiang, Yuling Cai, Hualong Ma, Yingjun Zhang, Guozhu Meng

Recent years have witnessed tremendous success in Self-Supervised Learning (SSL), which has been widely utilized to facilitate various downstream tasks in Computer Vision (CV) and Natural Language Processing (NLP) domains.

Self-Supervised Learning

Paper
Code

Consistent-Teacher: Towards Reducing Inconsistent Pseudo-targets in Semi-supervised Object Detection

1 code implementation • CVPR 2023 • Xinjiang Wang, Xingyi Yang, Shilong Zhang, Yijiang Li, Litong Feng, Shijie Fang, Chengqi Lyu, Kai Chen, Wayne Zhang

In this study, we dive deep into the inconsistency of pseudo targets in semi-supervised object detection (SSOD).

Ranked #2 on Semi-Supervised Object Detection on COCO 2% labeled data

object-detection Object Detection +1

271

Paper
Code

Trading Off Privacy, Utility and Efficiency in Federated Learning

no code implementations • 1 Sep 2022 • Xiaojin Zhang, Yan Kang, Kai Chen, Lixin Fan, Qiang Yang

In addition, it is a mandate for a federated learning system to achieve high \textit{efficiency} in order to enable large-scale model training and deployment.

Vertical Federated Learning

Paper
Add Code

Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain

no code implementations • 9 Jul 2022 • Chang Yue, Peizhuo Lv, Ruigang Liang, Kai Chen

However, most of the triggers used in the current study are fixed patterns patched on a small fraction of an image and are often clearly mislabeled, which is easily detected by humans or defense methods such as Neural Cleanse and SentiNet.

Backdoor Attack Data Poisoning +1

Paper
Add Code

Semi-blind source separation using convolutive transfer function for nonlinear acoustic echo cancellation

1 code implementation • 4 Jul 2022 • Guoliang Cheng, Lele Liao, Kai Chen, Yuxiang Hu, Changbao Zhu, Jing Lu

The recently proposed semi-blind source separation (SBSS) method for nonlinear acoustic echo cancellation (NAEC) outperforms adaptive NAEC in attenuating the nonlinear acoustic echo.

Acoustic echo cancellation blind source separation

Paper
Code

Secure Forward Aggregation for Vertical Federated Neural Networks

no code implementations • 28 Jun 2022 • Shuowei Cai, Di Chai, Liu Yang, Junxue Zhang, Yilun Jin, Leye Wang, Kun Guo, Kai Chen

In this paper, we focus on SplitNN, a well-known neural network framework in VFL, and identify a trade-off between data security and model performance in SplitNN.

Privacy Preserving Vertical Federated Learning

Paper
Add Code

A Novel Multi-Agent Scheduling Mechanism for Adaptation of Production Plans in Case of Supply Chain Disruptions

no code implementations • 23 Jun 2022 • Jing Tan, Lars Braubach, Kai Jander, Rongjun Xu, Kai Chen

The system has been implemented as proof of concept and is currently reimplemented and transferred to a production system based on the Jadex agent platform.

Scheduling

Paper
Add Code

What Are Expected Queries in End-to-End Object Detection?

1 code implementation • 2 Jun 2022 • Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Kai Chen

As both sparse and dense queries are imperfect, then \emph{what are expected queries in end-to-end object detection}?

Instance Segmentation object-detection +2

236

Paper
Code

Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing

no code implementations • 26 May 2022 • Zhili Liu, Jianhua Han, Lanqing Hong, Hang Xu, Kai Chen, Chunjing Xu, Zhenguo Li

On the other hand, for existing SSL methods, it is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.

Self-Supervised Learning

Paper
Add Code

PYSKL: Towards Good Practices for Skeleton Action Recognition

1 code implementation • 19 May 2022 • Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin

The toolbox supports a wide variety of skeleton action recognition algorithms, including approaches based on GCN and CNN.

Ranked #19 on Skeleton Based Action Recognition on NTU RGB+D 120

Action Recognition Skeleton Based Action Recognition

853

Paper
Code

Group R-CNN for Weakly Semi-supervised Object Detection with Points

1 code implementation • CVPR 2022 • Shilong Zhang, Zhuoran Yu, Liyang Liu, Xinjiang Wang, Aojun Zhou, Kai Chen

The core of this task is to train a point-to-box regressor on well-labeled images that can be used to predict credible bounding boxes for each point annotation.

Object Detection Representation Learning +1

137

Paper
Code

TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition

1 code implementation • CVPR 2022 • Haodong Duan, Nanxuan Zhao, Kai Chen, Dahua Lin

To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation.

Action Recognition Representation Learning +3

Paper
Code

MMRotate: A Rotated Object Detection Benchmark using PyTorch

1 code implementation • 28 Apr 2022 • Yue Zhou, Xue Yang, Gefan Zhang, Jiabao Wang, Yanyi Liu, Liping Hou, Xue Jiang, Xingzhao Liu, Junchi Yan, Chengqi Lyu, Wenwei Zhang, Kai Chen

We present an open-source toolbox, named MMRotate, which provides a coherent algorithm framework of training, inferring, and evaluation for the popular rotated object detection algorithm based on deep learning.

Object object-detection +1

1,724

Paper
Code

ROMA: Cross-Domain Region Similarity Matching for Unpaired Nighttime Infrared to Daytime Visible Video Translation

no code implementations • 26 Apr 2022 • Zhenjie Yu, Kai Chen, Shuang Li, Bingfeng Han, Chi Harold Liu, Shuigen Wang

To be specific, ROMA could efficiently translate the unpaired nighttime infrared videos into fine-grained daytime visible ones, meanwhile maintain the spatiotemporal consistency via matching the cross-domain region similarity.

Translation

Paper
Add Code

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results

2 code implementations • 20 Apr 2022 • Ren Yang, Radu Timofte, Meisong Zheng, Qunliang Xing, Minglang Qiao, Mai Xu, Lai Jiang, Huaida Liu, Ying Chen, Youcheng Ben, Xiao Zhou, Chen Fu, Pei Cheng, Gang Yu, Junyi Li, Renlong Wu, Zhilu Zhang, Wei Shang, Zhengyao Lv, Yunjin Chen, Mingcai Zhou, Dongwei Ren, Kai Zhang, WangMeng Zuo, Pavel Ostyakov, Vyal Dmitry, Shakarim Soltanayev, Chervontsev Sergey, Zhussip Magauiya, Xueyi Zou, Youliang Yan, Pablo Navarrete Michelini, Yunhua Lu, Diankai Zhang, Shaoli Liu, Si Gao, Biao Wu, Chengjian Zheng, Xiaofeng Zhang, Kaidi Lu, Ning Wang, Thuong Nguyen Canh, Thong Bach, Qing Wang, Xiaopeng Sun, Haoyu Ma, Shijie Zhao, Junlin Li, Liangbin Xie, Shuwei Shi, Yujiu Yang, Xintao Wang, Jinjin Gu, Chao Dong, Xiaodi Shi, Chunmei Nian, Dong Jiang, Jucai Lin, Zhihuai Xie, Mao Ye, Dengyan Luo, Liuhan Peng, Shengjie Chen, Qian Wang, Xin Liu, Boyang Liang, Hang Dong, Yuhao Huang, Kai Chen, Xingbei Guo, Yujing Sun, Huilei Wu, Pengxu Wei, Yulin Huang, Junying Chen, Ik Hyun Lee, Sunder Ali Khowaja, Jiseok Yoon

This challenge includes three tracks.

Super-Resolution

Paper
Code

Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin Picking

no code implementations • 14 Apr 2022 • Kai Chen, Rui Cao, Stephen James, Yichuan Li, Yun-hui Liu, Pieter Abbeel, Qi Dou

To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model.

6D Pose Estimation using RGB Robotic Grasping

Paper
Add Code

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation

1 code implementation • CVPR 2022 • Xiangtai Li, Wenwei Zhang, Jiangmiao Pang, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change Loy

We hope this simple, yet effective method can serve as a new, flexible baseline in unified video segmentation design.

Ranked #1 on Video Panoptic Segmentation on KITTI-STEP (using extra training data)

Image Segmentation Instance Segmentation +5

150

Paper
Code

StructToken : Rethinking Semantic Segmentation with Structural Prior

no code implementations • 23 Mar 2022 • Fangjian Lin, Zhanhao Liang, Sitong Wu, Junjun He, Kai Chen, Shengwei Tian

In previous deep-learning-based methods, semantic segmentation has been regarded as a static or dynamic per-pixel classification task, \textit{i. e.,} classify each pixel representation to a specific category.

Decision Making Segmentation +1

Paper
Add Code

Dense Siamese Network for Dense Unsupervised Learning

1 code implementation • 21 Mar 2022 • Wenwei Zhang, Jiangmiao Pang, Kai Chen, Chen Change Loy

It also extracts a batch of region embeddings that correspond to some sub-regions in the overlapped area to be contrasted for region consistency.

Ranked #2 on Unsupervised Semantic Segmentation on COCO-All (mIoU metric)

Self-Supervised Learning Unsupervised Semantic Segmentation

Paper
Code

APRNet: Attention-based Pixel-wise Rendering Network for Photo-Realistic Text Image Generation

no code implementations • 15 Mar 2022 • Yangming Shi, Haisong Ding, Kai Chen, Qiang Huo

Style-guided text image generation tries to synthesize text image by imitating reference image's appearance while keeping text content unaltered.

Image Generation

Paper
Add Code

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

no code implementations • 15 Mar 2022 • Kaican Li, Kai Chen, Haoyu Wang, Lanqing Hong, Chaoqiang Ye, Jianhua Han, Yukuai Chen, Wei zhang, Chunjing Xu, Dit-yan Yeung, Xiaodan Liang, Zhenguo Li, Hang Xu

One main reason that impedes the development of truly reliably self-driving systems is the lack of public datasets for evaluating the performance of object detectors on corner cases.

Autonomous Driving Object +2

Paper
Add Code

RotateQVS: Representing Temporal Information as Rotations in Quaternion Vector Space for Temporal Knowledge Graph Completion

no code implementations • ACL 2022 • Kai Chen, Ye Wang, Yitong Li, Aiping Li

Temporal factors are tied to the growth of facts in realistic applications, such as the progress of diseases and the development of political situation, therefore, research on Temporal Knowledge Graph (TKG) attracks much attention.

Ranked #3 on Link Prediction on GDELT

Knowledge Graph Completion Link Prediction +3

Paper
Add Code

GCFSR: a Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors

no code implementations • CVPR 2022 • Jingwen He, Wu Shi, Kai Chen, Lean Fu, Chao Dong

The style modulation aims to generate realistic face details and the feature modulation dynamically fuses the multi-level encoded features and the generated ones conditioned on the upscaling factor.

Face Hallucination Hallucination +1

Paper
Add Code

No Free Lunch Theorem for Security and Utility in Federated Learning

no code implementations • 11 Mar 2022 • Xiaojin Zhang, Hanlin Gu, Lixin Fan, Kai Chen, Qiang Yang

In a federated learning scenario where multiple parties jointly learn a model from their respective data, there exist two conflicting goals for the choice of appropriate algorithms.

Federated Learning Privacy Preserving

Paper
Add Code

Towards Robust Part-aware Instance Segmentation for Industrial Bin Picking

no code implementations • 5 Mar 2022 • Yidan Feng, Biqi Yang, Xianzhi Li, Chi-Wing Fu, Rui Cao, Kai Chen, Qi Dou, Mingqiang Wei, Yun-hui Liu, Pheng-Ann Heng

Industrial bin picking is a challenging task that requires accurate and robust segmentation of individual object instances.

Instance Segmentation Segmentation +1

Paper
Add Code

PPA: Preference Profiling Attack Against Federated Learning

no code implementations • 10 Feb 2022 • Chunyi Zhou, Yansong Gao, Anmin Fu, Kai Chen, Zhiyang Dai, Zhi Zhang, Minhui Xue, Yuqing Zhang

By observing a user model's gradient sensitivity to a class, PPA can profile the sample proportion of the class in the user's local dataset, and thus the user's preference of the class is exposed.

Federated Learning Inference Attack

Paper
Add Code

Egeria: Efficient DNN Training with Knowledge-Guided Layer Freezing

no code implementations • 17 Jan 2022 • Yiding Wang, Decang Sun, Kai Chen, Fan Lai, Mosharaf Chowdhury

To explore this, we first introduce the notion of training plasticity to quantify the training progress of internal DNN layers.

Quantization

Paper
Add Code

Efficient Data-Plane Memory Scheduling for In-Network Aggregation

no code implementations • 17 Jan 2022 • Hao Wang, Yuxuan Qin, ChonLam Lao, Yanfang Le, Wenfei Wu, Kai Chen

However, switch memory is scarce compared to the volume of gradients transmitted in distributed training.

Scheduling

Paper
Add Code

OCSampler: Compressing Videos to One Clip with Single-step Sampling

1 code implementation • CVPR 2022 • Jintao Lin, Haodong Duan, Kai Chen, Dahua Lin, LiMin Wang

Recent works prefer to formulate frame sampling as a sequential decision task by selecting frames one by one according to their importance, while we present a new paradigm of learning instance-specific video condensation policies to select informative frames for representing the entire video only in a single step.

Video Recognition

Paper
Code

TAGPerson: A Target-Aware Generation Pipeline for Person Re-identification

1 code implementation • 28 Dec 2021 • Kai Chen, Weihua Chen, Tao He, Rong Du, Fan Wang, Xiuyu Sun, Yuchen Guo, Guiguang Ding

In TAGPerson, we extract information from target scenes and use them to control our parameterized rendering process to generate target-aware synthetic images, which would hold a smaller gap to the real images in the target domain.

Person Re-Identification

Paper
Code

Automatic Configuration for Optimal Communication Scheduling in DNN Training

no code implementations • 27 Dec 2021 • Yiqing Ma, Hao Wang, Yiming Zhang, Kai Chen

ByteScheduler partitions and rearranges tensor transmissions to improve the communication efficiency of distributed Deep Neural Network (DNN) training.

Bayesian Optimization Scheduling

Paper
Add Code

AI-Lancet: Locating Error-inducing Neurons to Optimize Neural Networks

1 code implementation • ACM SIGSAC Conference on Computer and Communications Security 2021 • Yue Zhao, Hong Zhu, Kai Chen, Shengzhi Zhang

With the knowledge of error-inducing neurons, we propose two methods to fix the errors: the neuron-flip and the neuron-fine-tuning.

Paper
Code

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

1 code implementation • CVPR 2022 • Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, Philip H. S. Torr

Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image.

Ranked #3 on Generalized Referring Expression Segmentation on gRefCOCO

Generalized Referring Expression Segmentation Image Segmentation +2

169

Paper
Code

Few-Shot Object Detection via Association and DIscrimination

1 code implementation • NeurIPS 2021 • Yuhang Cao, Jiaqi Wang, Ying Jin, Tong Wu, Kai Chen, Ziwei Liu, Dahua Lin

1) In the association step, in contrast to implicitly leveraging multiple base classes, we construct a compact novel class feature space via explicitly imitating a specific base class feature space.

Few-Shot Object Detection Object +3

Paper
Code

DBIA: Data-free Backdoor Injection Attack against Transformer Networks

1 code implementation • 22 Nov 2021 • Peizhuo Lv, Hualong Ma, Jiachen Zhou, Ruigang Liang, Kai Chen, Shengzhi Zhang, Yunfei Yang

In this paper, we propose DBIA, a novel data-free backdoor attack against the CV-oriented transformer networks, leveraging the inherent attention mechanism of transformers to generate triggers and injecting the backdoor using the poisoned surrogate dataset.

Backdoor Attack Image Classification +1

Paper
Code

Attacking Video Recognition Models with Bullet-Screen Comments

1 code implementation • 29 Oct 2021 • Kai Chen, Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

On both UCF-101 and HMDB-51 datasets, our BSC attack method can achieve about 90\% fooling rate when attacking three mainstream video recognition models, while only occluding \textless 8\% areas in the video.

Adversarial Attack Adversarial Attack on Video Classification +2

Paper
Code

The Nuts and Bolts of Adopting Transformer in GANs

no code implementations • 25 Oct 2021 • Rui Xu, Xiangyu Xu, Kai Chen, Bolei Zhou, Chen Change Loy

Transformer becomes prevalent in computer vision, especially for high-level vision tasks.

Generative Adversarial Network Image Generation

Paper
Add Code

Quantitative relations among causality measures with applications to nonlinear pulse-output network reconstruction

no code implementations • 17 Oct 2021 • Zhong-qi K. Tian, Kai Chen, Songting Li, David W. McLaughlin, Douglas Zhou

However, the interpretation of causal connectivity remains to be fully clarified, in particular, how causal connectivity depends on causality measures and how causal connectivity relates to structural connectivity.

Paper
Add Code

Foreground-attention in neural decoding: Guiding Loop-Enc-Dec to reconstruct visual stimulus images from fMRI

no code implementations • 29 Sep 2021 • Kai Chen, Yongqiang Ma, Mingyang Sheng, Nanning Zheng

Inspired by the mechanism of human visual attention, in this paper, we propose a novel method of reconstructing visual stimulus images, which first decodes the distribution of visual attention from fMRI, and then reconstructs the visual images guided by visual attention.

Image Reconstruction

Paper
Add Code

Temporal RoI Align for Video Object Recognition

1 code implementation • 8 Sep 2021 • Tao Gong, Kai Chen, Xinjiang Wang, Qi Chu, Feng Zhu, Dahua Lin, Nenghai Yu, Huamin Feng

In this work, considering the features of the same object instance are highly similar among frames in a video, a novel Temporal RoI Align operator is proposed to extract features from other frames feature maps for current frame proposals by utilizing feature similarity.

Ranked #1 on Video Instance Segmentation on YouTube-VIS

Instance Segmentation Object +5

3,374

Paper
Code

MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving

1 code implementation • ICCV 2021 • Kai Chen, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung

By pre-training on SODA10M, a large-scale autonomous driving dataset, MultiSiam exceeds the ImageNet pre-trained MoCo-v2, demonstrating the potential of domain-specific pre-training.

Autonomous Driving Image Clustering +2

Paper
Code

Towards Balanced Learning for Instance Recognition

no code implementations • 23 Aug 2021 • Jiangmiao Pang, Kai Chen, Qi Li, Zhihai Xu, Huajun Feng, Jianping Shi, Wanli Ouyang, Dahua Lin

In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level.

Paper
Add Code

Category-Level 6D Object Pose Estimation via Cascaded Relation and Recurrent Reconstruction Networks

no code implementations • 19 Aug 2021 • Jiaze Wang, Kai Chen, Qi Dou

Furthermore, we design a recurrent reconstruction network for iterative residual refinement to progressively improve the reconstruction and correspondence estimations from coarse to fine.

6D Pose Estimation 6D Pose Estimation using RGB +3

Paper
Add Code

Practical and Secure Federated Recommendation with Personalized Masks

no code implementations • 18 Aug 2021 • Liu Yang, Junxue Zhang, Di Chai, Leye Wang, Kun Guo, Kai Chen, Qiang Yang

In this paper, we proposed federated masked matrix factorization (FedMMF) to protect the data privacy in federated recommender systems without sacrificing efficiency and effectiveness.

Federated Learning Recommendation Systems

Paper
Add Code

Aegis: A Trusted, Automatic and Accurate Verification Framework for Vertical Federated Learning

no code implementations • 16 Aug 2021 • Cengguang Zhang, Junxue Zhang, Di Chai, Kai Chen

In this paper, we present Aegis, a trusted, automatic, and accurate verification framework to verify the security of VFL jobs.

Privacy Preserving Vertical Federated Learning

Paper
Add Code

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

2 code implementations • 14 Aug 2021 • Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin

We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction.

Key Information Extraction named-entity-recognition +4

4,065

Paper
Code

I2V-GAN: Unpaired Infrared-to-Visible Video Translation

1 code implementation • 2 Aug 2021 • Shuang Li, Bingfeng Han, Zhenjie Yu, Chi Harold Liu, Kai Chen, Shuigen Wang

Human vision is often adversely affected by complex environmental factors, especially in night vision scenarios.

object-detection Object Detection +1

Paper
Code

HAFLO: GPU-Based Acceleration for Federated Logistic Regression

no code implementations • 29 Jul 2021 • Xiaodian Cheng, Wanhang Lu, Xinyang Huang, Shuihai Hu, Kai Chen

In recent years, federated learning (FL) has been widely applied for supporting decentralized collaborative learning scenarios.

Federated Learning regression

Paper
Add Code

K-Net: Towards Unified Image Segmentation

1 code implementation • NeurIPS 2021 • Wenwei Zhang, Jiangmiao Pang, Kai Chen, Chen Change Loy

The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class.

Ranked #7 on Panoptic Segmentation on COCO test-dev

Image Segmentation Instance Segmentation +2

457

Paper
Code

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving

no code implementations • 21 Jun 2021 • Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Jiageng Mao, Chaoqiang Ye, Wei zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu

Experiments show that SODA10M can serve as a promising pre-training dataset for different self-supervised learning methods, which gives superior performance when fine-tuning with different downstream tasks (i. e., detection, semantic/instance segmentation) in autonomous driving domain.

Autonomous Driving Instance Segmentation +5

Paper
Add Code

Learning To Identify Correct 2D-2D Line Correspondences on Sphere

no code implementations • CVPR 2021 • Haoang Li, Kai Chen, Ji Zhao, Jiangliu Wang, Pyojin Kim, Zhe Liu, Yun-hui Liu

In contrast, we propose the first approach suitable for both structured and unstructured scenes.

Paper
Add Code

WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection

no code implementations • 21 May 2021 • Shijie Fang, Yuhang Cao, Xinjiang Wang, Kai Chen, Dahua Lin, Wayne Zhang

The performance of object detection, to a great extent, depends on the availability of large annotated datasets.

object-detection Object Detection +2

Paper
Add Code

DeepObliviate: A Powerful Charm for Erasing Data Residual Memory in Deep Neural Networks

no code implementations • 13 May 2021 • Yingzhe He, Guozhu Meng, Kai Chen, Jinwen He, Xingbo Hu

Compared to the method of retraining from scratch, our approach can achieve 99. 0%, 95. 0%, 91. 9%, 96. 7%, 74. 1% accuracy rates and 66. 7$\times$, 75. 0$\times$, 33. 3$\times$, 29. 4$\times$, 13. 7$\times$ speedups on the MNIST, SVHN, CIFAR-10, Purchase, and ImageNet datasets, respectively.

Machine Unlearning

Paper
Add Code

Revisiting Skeleton-based Action Recognition

4 code implementations • CVPR 2022 • Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai

In this work, we propose PoseC3D, a new approach to skeleton-based action recognition, which relies on a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons.

Ranked #1 on Action Recognition on NTU RGB+D

Group Activity Recognition Pose Estimation +1

3,887

Paper
Code

HufuNet: Embedding the Left Piece as Watermark and Keeping the Right Piece for Ownership Verification in Deep Neural Networks

no code implementations • 25 Mar 2021 • Peizhuo Lv, Pan Li, Shengzhi Zhang, Kai Chen, Ruigang Liang, Yue Zhao, Yingjiu Li

Most existing solutions embed backdoors in DNN model training such that DNN ownership can be verified by triggering distinguishable model behaviors with a set of secret inputs.

Paper
Add Code

SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition Systems

1 code implementation • 19 Mar 2021 • Yuxuan Chen, Jiangshan Zhang, Xuejing Yuan, Shengzhi Zhang, Kai Chen, XiaoFeng Wang, Shanqing Guo

In this paper, we present our systematization of knowledge for ASR security and provide a comprehensive taxonomy for existing work based on a modularized workflow.

Adversarial Attack Automatic Speech Recognition +3

Paper
Code

Recent Advances in Data-Driven Wireless Communication Using Gaussian Processes: A Comprehensive Survey

no code implementations • 18 Mar 2021 • Kai Chen, Qinglei Kong, Yijue Dai, Yue Xu, Feng Yin, Lexi Xu, Shuguang Cui

Empowered by big data and machine learning, next-generation data-driven communication systems will be intelligent with the characteristics of expressiveness, scalability, interpretability, and especially uncertainty modeling, which can confidently involve diversified latent demands and personalized services in the foreseeable future.

BIG-bench Machine Learning Gaussian Processes

Paper
Add Code

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

1 code implementation • 18 Mar 2021 • Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimosthenis Karatzas, Shjian Lu, C. V. Jawahar

In this competition, we set up three tasks, namely, Scanned Receipt Text Localisation (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3).

Key Information Extraction Optical Character Recognition (OCR) +1

38,418

Paper
Code

NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting

1 code implementation • 10 Feb 2021 • Kai Chen, Guang Chen, Dan Xu, Lijun Zhang, Yuyao Huang, Alois Knoll

Although Transformer has made breakthrough success in widespread domains especially in Natural Language Processing (NLP), applying it to time series forecasting is still a great challenge.

Time Series Time Series Forecasting

Paper
Code

Fast and Reliable Probabilistic Face Embeddings in the Wild

1 code implementation • 8 Feb 2021 • Kai Chen, Qi Lv, Taihe Yi

In addition, an identification preserving loss is proposed to improve the discriminative of the MLS metric, and a multi-layer feature fusion module is proposed to improve the neural network's uncertainty estimation ability.

Face Recognition

Paper
Code

Learning Icosahedral Spherical Probability Map Based on Bingham Mixture Model for Vanishing Point Estimation

no code implementations • ICCV 2021 • Haoang Li, Kai Chen, Pyojin Kim, Kuk-Jin Yoon, Zhe Liu, Kyungdon Joo, Yun-hui Liu

Based on this map, we can detect all the VPs.

Paper
Add Code

SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation

no code implementations • ICCV 2021 • Kai Chen, Qi Dou

The prior adaptation intrinsically associates the adopted prior with different objects, from which we can accurately reconstruct the 3D canonical model of the specific object for pose estimation.

6D Pose Estimation using RGB Object

Paper
Add Code

UMLE: Unsupervised Multi-discriminator Network for Low Light Enhancement

no code implementations • 24 Dec 2020 • Yangyang Qu, Kai Chen, Chao Liu, Yongsheng Ou

To address this problem, we propose a real-time unsupervised generative adversarial network (GAN) containing multiple discriminators, i. e. a multi-scale discriminator, a texture discriminator, and a color discriminator.

Generative Adversarial Network Low-Light Image Enhancement

Paper
Add Code

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

no code implementations • 22 Dec 2020 • Kai Chen, Meng Niu, Qingcai Chen

In this work, we propose a Hierarchical Reasoning Graph Neural Network (HRGNN) for the automatic assessment of question-answer pairs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Positional Encoding as Spatial Inductive Bias in GANs

no code implementations • CVPR 2021 • Rui Xu, Xintao Wang, Kai Chen, Bolei Zhou, Chen Change Loy

In this work, taking SinGAN and StyleGAN2 as examples, we show that such capability, to a large extent, is brought by the implicit positional encoding when using zero padding in the generators.

Image Manipulation Inductive Bias +1

Paper
Add Code

CARAFE++: Unified Content-Aware ReAssembly of FEatures

no code implementations • 7 Dec 2020 • Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin

Feature reassembly, i. e. feature downsampling and upsampling, is a key operation in a number of modern convolutional network architectures, e. g., residual networks and feature pyramids.

Image Inpainting Instance Segmentation +3

Paper
Add Code

RRCN: A Reinforced Random Convolutional Network based Reciprocal Recommendation Approach for Online Dating

no code implementations • 25 Nov 2020 • Linhao Luo, Liqi Yang, Ju Xin, Yixiang Fang, Xiaofeng Zhang, Xiaofei Yang, Kai Chen, Zhiyuan Zhang, Kai Liu

In particular, we technically propose a novel random CNN component that can randomly convolute non-adjacent features to capture their interaction information and learn feature embeddings of key attributes to make the final recommendation.

Paper
Add Code

Neural-iLQR: A Learning-Aided Shooting Method for Trajectory Optimization

no code implementations • 21 Nov 2020 • Zilong Cheng, Yulin Li, Kai Chen, Jun Ma, Tong Heng Lee

Iterative linear quadratic regulator (iLQR) has gained wide popularity in addressing trajectory optimization problems with nonlinear system models.

Paper
Add Code

FedEval: A Holistic Evaluation Framework for Federated Learning

1 code implementation • 19 Nov 2020 • Di Chai, Leye Wang, Liu Yang, Junxue Zhang, Kai Chen, Qiang Yang

In this paper, we propose a holistic evaluation framework for FL called FedEval, and present a benchmarking study on seven state-of-the-art FL algorithms.

Benchmarking Federated Learning +1

Paper
Code

Gaussian Processes with Skewed Laplace Spectral Mixture Kernels for Long-term Forecasting

no code implementations • 8 Nov 2020 • Kai Chen, Twan van Laarhoven, Elena Marchiori

The heavy tail and skewness characteristics of such distributions in the spectral domain allow to capture long-range covariance of the signal in the time domain.

Gaussian Processes Time Series Analysis

Paper
Add Code

Possible multi-orbital ground state in CeCu$_2$Si$_2$

no code implementations • 5 Oct 2020 • Andrea Amorese, Andrea Marino, Martin Sundermann, Kai Chen, Zhiwei Hu, Thomas Willers, Fadi Choukani, Philippe Ohresser, Javier Herrero-Martin, Stefano Agrestini, Chien-Te Chen, Hong-Ji Lin, Maurits W. Haverkort, Silvia Seiro, Christoph Geibel, Frank Steglich, Liu Hao Tjeng, Gertrud Zwicknagl, Andrea Severing

The crystal-field ground state wave function of CeCu$_2$Si$_2$ has been investigated with linear polarized $M$-edge x-ray absorption spectroscopy from 250mK to 250K, thus covering the superconducting ($T_{\text{c}}$=0. 6K), the Kondo ($T_{\text{K}}$$\approx$20K) as well as the Curie-Weiss regime.

Strongly Correlated Electrons

Paper
Add Code

Exploring the Generalizability of Spatio-Temporal Traffic Prediction: Meta-Modeling and an Analytic Framework

1 code implementation • 20 Sep 2020 • Leye Wang, Di Chai, Xuanzhe Liu, Liyue Chen, Kai Chen

The Spatio-Temporal Traffic Prediction (STTP) problem is a classical problem with plenty of prior research efforts that benefit from traditional statistical learning and recent deep learning approaches.

Traffic Prediction

156

Paper
Code

Seesaw Loss for Long-Tailed Instance Segmentation

2 code implementations • CVPR 2021 • Jiaqi Wang, Wenwei Zhang, Yuhang Zang, Yuhang Cao, Jiangmiao Pang, Tao Gong, Kai Chen, Ziwei Liu, Chen Change Loy, Dahua Lin

Instances of head classes dominate a long-tailed dataset and they serve as negative samples of tail categories.

Instance Segmentation Semantic Segmentation

27,765

Paper
Code

Domain-specific Communication Optimization for Distributed DNN Training

no code implementations • 16 Aug 2020 • Hao Wang, Jingrong Chen, Xinchen Wan, Han Tian, Jiacheng Xia, Gaoxiong Zeng, Weiyan Wang, Kai Chen, Wei Bai, Junchen Jiang

Communication overhead poses an important obstacle to distributed DNN training and draws increasing attention in recent years.

Scheduling

Paper
Add Code

FPGA-Based Hardware Accelerator of Homomorphic Encryption for Efficient Federated Learning

no code implementations • 21 Jul 2020 • Zhaoxiong Yang, Shuihai Hu, Kai Chen

Our framework implements the representative Paillier homomorphic cryptosystem with high level synthesis for flexibility and portability, with careful optimization on the modular multiplication operation in terms of processing clock cycle, resource usage and clock frequency.

Federated Learning Privacy Preserving

Paper
Add Code

DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training

no code implementations • 7 Jul 2020 • Weiyan Wang, Cengguang Zhang, Liu Yang, Kai Chen, Kun Tan

However, due to the global synchronization nature, its performance can be significantly influenced by network bottlenecks caused by either static topology heterogeneity or dynamic bandwidth contentions.

BIG-bench Machine Learning

Paper
Add Code

Target Speech Extraction Based on Blind Source Separation and X-vector-based Speaker Selection Trained with Data Augmentation

1 code implementation • 16 May 2020 • Zhaoyi Gu, Lele Liao, Kai Chen, Jing Lu

Extracting the desired speech from a mixture is a meaningful and challenging task.

blind source separation Data Augmentation +2

Paper
Code

Nonlinear Residual Echo Suppression Based on Multi-stream Conv-TasNet

1 code implementation • 15 May 2020 • Hongsheng Chen, Teng Xiang, Kai Chen, Jing Lu

Acoustic echo cannot be entirely removed by linear adaptive filters due to the nonlinear relationship between the echo and far-end signal.

Acoustic echo cancellation

Paper
Code

Neural Data-to-Text Generation with Dynamic Content Planning

no code implementations • 16 Apr 2020 • Kai Chen, Fayuan Li, Baotian Hu, Weihua Peng, Qingcai Chen, Hong Yu

We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text.

Data-to-Text Generation

Paper
Add Code

Feature Pyramid Grids

1 code implementation • 7 Apr 2020 • Kai Chen, Yuhang Cao, Chen Change Loy, Dahua Lin, Christoph Feichtenhofer

Feature pyramid networks have been widely adopted in the object detection literature to improve feature representations for better handling of variations in scale.

Neural Architecture Search object-detection +2

27,759

Paper
Code

Vanishing Point Guided Natural Image Stitching

no code implementations • 6 Apr 2020 • Kai Chen, Jian Yao, Jingmin Tu, Yahui Liu, Yinxuan Li, Li Li

Recently, works on improving the naturalness of stitching images gain more and more extensive attention.

Image Stitching

Paper
Add Code

Quantifying the Performance of Federated Transfer Learning

no code implementations • 30 Dec 2019 • Qinghe Jing, Weiyan Wang, Junxue Zhang, Han Tian, Kai Chen

The scarcity of data and isolated data islands encourage different organizations to share data with each other to train machine learning models.

Transfer Learning

Paper
Add Code

Unified Approach to Witness Nonentanglement-Breaking Quantum Channels

no code implementations • 23 Dec 2019 • Yi-Zheng Zhen, Yingqiu Mao, Kai Chen, Francesco Buscemi, Oscar Dahlsten

The ability of quantum devices to preserve or distribute entanglement is essential in employing quantum technologies.

Quantum Physics

Paper
Add Code

Side-Aware Boundary Localization for More Precise Object Detection

3 code implementations • ECCV 2020 • Jiaqi Wang, Wenwei Zhang, Yuhang Cao, Kai Chen, Jiangmiao Pang, Tao Gong, Jianping Shi, Chen Change Loy, Dahua Lin

To tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket.

Object object-detection +2

27,765

Paper
Code

Towards Security Threats of Deep Learning Systems: A Survey

no code implementations • 28 Nov 2019 • Yingzhe He, Guozhu Meng, Kai Chen, Xingbo Hu, Jinwen He

In order to unveil the security weaknesses and aid in the development of a robust deep learning system, we undertake an investigation on attacks towards deep learning, and analyze these attacks to conclude some findings in multiple views.

Adversarial Attack Model extraction

Paper
Add Code

An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training Strategy

no code implementations • 21 Nov 2019 • Jiaxu Chen, Jing Hao, Kai Chen, Di Xie, Shicai Yang, ShiLiang Pu

This paper introduces an end-to-end audio classification system based on raw waveforms and mix-training strategy.

Audio Classification General Classification +1

Paper
Add Code

Gliding vertex on the horizontal bounding box for multi-oriented object detection

1 code implementation • 21 Nov 2019 • Yongchao Xu, Mingtao Fu, Qimeng Wang, Yukang Wang, Kai Chen, Gui-Song Xia, Xiang Bai

Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as objects in aerial images and scene texts.

Ranked #42 on Object Detection In Aerial Images on DOTA (using extra training data)

Object object-detection +5

1,724

Paper
Code

Real-time Scene Text Detection with Differentiable Binarization

15 code implementations • 20 Nov 2019 • Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text.

Ranked #6 on Scene Text Detection on MSRA-TD500

Binarization Optical Character Recognition (OCR) +3

38,418

Paper
Code

Interpretable Encrypted Searchable Neural Networks

no code implementations • 14 Aug 2019 • Kai Chen, Zhongrui Lin, Jian Wan, Chungen Xu

In cloud security, traditional searchable encryption (SE) requires high computation and communication overhead for dynamic search and update.

Paper
Add Code

Multi-owner Secure Encrypted Search Using Searching Adversarial Networks

no code implementations • 7 Aug 2019 • Kai Chen, Zhongrui Lin, Jian Wan, Lei Xu, Chungen Xu

To address this, this paper proposes secure and efficient multi-keyword ranked search over encrypted cloud data for multi-owner model based on searching adversarial networks.

Distributed Computing

Paper
Add Code

MMDetection: Open MMLab Detection Toolbox and Benchmark

144 code implementations • 17 Jun 2019 • Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

In this paper, we introduce the various features of this toolbox.

Benchmarking Instance Segmentation +2

27,765

Paper
Code

Secure Federated Matrix Factorization

no code implementations • 12 Jun 2019 • Di Chai, Leye Wang, Kai Chen, Qiang Yang

The key principle of federated learning is training a machine learning model without needing to know each user's personal raw private data.

BIG-bench Machine Learning Federated Learning

Paper
Add Code

Extracting Symptoms and their Status from Clinical Conversations

no code implementations • ACL 2019 • Nan Du, Kai Chen, Anjuli Kannan, Linh Tran, Yu-Hui Chen, Izhak Shafran

This paper describes novel models tailored for a new application, that of extracting the symptoms mentioned in clinical conversations along with their status.

Attribute

Paper
Add Code

CARAFE: Content-Aware ReAssembly of FEatures

3 code implementations • ICCV 2019 • Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin

CARAFE introduces little computational overhead and can be readily integrated into modern network architectures.

Ranked #3 on Feature Upsampling on ImageNet

Feature Upsampling Instance Segmentation +3

27,765

Paper
Code

Prime Sample Attention in Object Detection

1 code implementation • CVPR 2020 • Yuhang Cao, Kai Chen, Chen Change Loy, Dahua Lin

Our experiments demonstrate that it is often more effective to focus on prime samples than hard samples when training a detector.

Object object-detection +1

27,765

Paper
Code

Libra R-CNN: Towards Balanced Learning for Object Detection

6 code implementations • CVPR 2019 • Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, Dahua Lin

Ranked #149 on Object Detection on COCO test-dev

object-detection Object Detection

27,765

Paper
Code

Hybrid Task Cascade for Instance Segmentation

5 code implementations • CVPR 2019 • Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation.

Ranked #32 on Object Detection on COCO-O

Instance Segmentation object-detection +4

27,765

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.