Search Results for author: Kaiyang Zhou

Found 35 papers, 29 papers with code

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

1 code implementation • 26 Mar 2024 • Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhang

In this paper, we introduce a versatile adaptation approach that can effectively work under all three settings.

Paper
Code

Open-Vocabulary Calibration for Vision-Language Models

no code implementations • 7 Feb 2024 • Shuoyuan Wang, Jindong Wang, Guoqing Wang, Bob Zhang, Kaiyang Zhou, Hongxin Wei

Vision-language models (VLMs) have emerged as formidable tools, showing their strong capability in handling various open-vocabulary tasks in image recognition, text-driven visual content generation, and visual chatbots, to name a few.

Paper
Add Code

Panoptic Video Scene Graph Generation

3 code implementations • CVPR 2023 • Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu

PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos.

Graph Generation Panoptic Scene Graph Generation +5

Paper
Code

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

1 code implementation • 12 Oct 2023 • Jingkang Yang, Yuhao Dong, Shuai Liu, Bo Li, Ziyue Wang, Chencheng Jiang, Haoran Tan, Jiamu Kang, Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu

Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning.

Decision Making

230

Paper
Code

OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection

1 code implementation • 15 Jun 2023 • Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Yixuan Li, Ziwei Liu, Yiran Chen, Hai Li

Out-of-Distribution (OOD) detection is critical for the reliable operation of open-world intelligent systems.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

749

Paper
Code

Contextual Object Detection with Multimodal Large Language Models

1 code implementation • 29 May 2023 • Yuhang Zang, Wei Li, Jun Han, Kaiyang Zhou, Chen Change Loy

Moreover, we present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts, so as to locate, identify, and associate visual objects with language inputs for human-AI interaction.

Cloze Test Image Captioning +6

158

Paper
Code

Semi-Supervised and Long-Tailed Object Detection with CascadeMatch

no code implementations • 24 May 2023 • Yuhang Zang, Kaiyang Zhou, Chen Huang, Chen Change Loy

This paper focuses on long-tailed object detection in the semi-supervised learning setting, which poses realistic challenges, but has rarely been studied in the literature.

Long-tailed Object Detection Object +3

Paper
Add Code

What Makes Good Examples for Visual In-Context Learning?

1 code implementation • NeurIPS 2023 • Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu

To overcome the problem, we propose a prompt retrieval framework to automate the selection of in-context examples.

In-Context Learning Retrieval

155

Paper
Code

Learning to Augment via Implicit Differentiation for Domain Generalization

no code implementations • 25 Oct 2022 • Tingwei Wang, Da Li, Kaiyang Zhou, Tao Xiang, Yi-Zhe Song

Machine learning models are intrinsically vulnerable to domain shift between training and testing data, resulting in poor performance in novel domains.

Data Augmentation Domain Generalization +1

Paper
Add Code

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection

3 code implementations • 13 Oct 2022 • Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Dan Hendrycks, Yixuan Li, Ziwei Liu

Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications and has thus been extensively studied, with a plethora of methods developed in the literature.

Anomaly Detection Benchmarking +3

749

Paper
Code

Unified Vision and Language Prompt Learning

1 code implementation • 13 Oct 2022 • Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy

Prompt tuning, a parameter- and data-efficient transfer learning paradigm that tunes only a small number of parameters in a model's input space, has become a trend in the vision community since the emergence of large vision-language models like CLIP.

Domain Generalization Few-Shot Learning +2

Paper
Code

On-Device Domain Generalization

2 code implementations • 15 Sep 2022 • Kaiyang Zhou, Yuanhan Zhang, Yuhang Zang, Jingkang Yang, Chen Change Loy, Ziwei Liu

Another interesting observation is that the teacher-student gap on out-of-distribution data is bigger than that on in-distribution data, which highlights the capacity mismatch issue as well as the shortcoming of KD.

Data Augmentation Domain Generalization +2

256

Paper
Code

Panoptic Scene Graph Generation

1 code implementation • 22 Jul 2022 • Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, Ziwei Liu

Existing research addresses scene graph generation (SGG) -- a critical technology for scene understanding in images -- from a detection perspective, i. e., objects are detected using bounding boxes followed by prediction of their pairwise relationships.

Ranked #5 on Panoptic Scene Graph Generation on PSG Dataset

Benchmarking Panoptic Scene Graph Generation +1

387

Paper
Code

Detecting Humans in RGB-D Data with CNNs

1 code implementation • 17 Jul 2022 • Kaiyang Zhou, Adeline Paiement, Majid Mirmehdi

We address the problem of people detection in RGB-D data where we leverage depth information to develop a region-of-interest (ROI) selection method that provides proposals to two color and depth CNNs.

Paper
Code

Neural Prompt Search

1 code implementation • 9 Jun 2022 • Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu

The size of vision models has grown exponentially over the last few years, especially after the emergence of Vision Transformer.

Ranked #1 on Image Classification on OmniBenchmark (using extra training data)

Few-Shot Learning Image Classification +3

203

Paper
Code

Full-Spectrum Out-of-Distribution Detection

1 code implementation • 11 Apr 2022 • Jingkang Yang, Kaiyang Zhou, Ziwei Liu

In this paper, we take into account both shift types and introduce full-spectrum OOD (FS-OOD) detection, a more realistic problem setting that considers both detecting semantic shift and being tolerant to covariate shift; and designs three benchmarks.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

749

Paper
Code

Open-Vocabulary DETR with Conditional Matching

1 code implementation • 22 Mar 2022 • Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy

To this end, we propose a novel open-vocabulary detector based on DETR -- hence the name OV-DETR -- which, once trained, can detect any object given its class name or an exemplar image.

Ranked #21 on Open Vocabulary Object Detection on MSCOCO

Language Modelling object-detection +1

194

Paper
Code

Conditional Prompt Learning for Vision-Language Models

9 code implementations • CVPR 2022 • Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets.

Ranked #3 on Prompt Engineering on ImageNet V2

Domain Generalization Prompt Engineering

1,484

Paper
Code

Dynamic Instance Domain Adaptation

1 code implementation • 9 Mar 2022 • Zhongying Deng, Kaiyang Zhou, Da Li, Junjun He, Yi-Zhe Song, Tao Xiang

In this paper, we address both single-source and multi-source UDA from a completely different perspective, which is to view each instance as a fine domain.

Unsupervised Domain Adaptation

Paper
Code

Domain Attention Consistency for Multi-Source Domain Adaptation

1 code implementation • 6 Nov 2021 • Zhongying Deng, Kaiyang Zhou, Yongxin Yang, Tao Xiang

Importantly, the attention module is supervised by a consistency loss, which is imposed on the distributions of channel attention weights between source and target domains.

Attribute Domain Adaptation

Paper
Code

Generalized Out-of-Distribution Detection: A Survey

3 code implementations • 21 Oct 2021 • Jingkang Yang, Kaiyang Zhou, Yixuan Li, Ziwei Liu

In this survey, we first present a unified framework called generalized OOD detection, which encompasses the five aforementioned problems, i. e., AD, ND, OSR, OOD detection, and OD.

Anomaly Detection Autonomous Driving +5

749

Paper
Code

Learning to Prompt for Vision-Language Models

13 code implementations • 2 Sep 2021 • Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks.

Ranked #2 on Few-shot Age Estimation on MORPH Album2

Domain Generalization Few-shot Age Estimation +2

1,484

Paper
Code

Energy-Based Open-World Uncertainty Modeling for Confidence Calibration

no code implementations • ICCV 2021 • Yezhen Wang, Bo Li, Tong Che, Kaiyang Zhou, Ziwei Liu, Dongsheng Li

Confidence calibration is of great importance to the reliability of decisions made by machine learning systems.

Paper
Add Code

MixStyle Neural Networks for Domain Generalization and Adaptation

2 code implementations • 5 Jul 2021 • Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang

MixStyle is easy to implement with a few lines of code, does not require modification to training objectives, and can fit a variety of learning paradigms including supervised domain generalization, semi-supervised domain generalization, and unsupervised domain adaptation.

Data Augmentation Domain Generalization +6

1,083

Paper
Code

Semi-Supervised Domain Generalization with Stochastic StyleMatch

2 code implementations • 1 Jun 2021 • Kaiyang Zhou, Chen Change Loy, Ziwei Liu

We find that the DG methods, which by design are unable to handle unlabeled data, perform poorly with limited labels in SSDG; the SSL methods, especially FixMatch, obtain much better results but are still far away from the basic vanilla model trained using full labels.

Domain Generalization Semi-Supervised Domain Generalization

1,083

Paper
Code

Domain Generalization with MixStyle

3 code implementations • ICLR 2021 • Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang

Our method, termed MixStyle, is motivated by the observation that visual domain is closely related to image style (e. g., photo vs.~sketch images).

Ranked #57 on Domain Generalization on PACS

Domain Generalization Retrieval

3,146

Paper
Code

Domain Generalization: A Survey

2 code implementations • 3 Mar 2021 • Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, Chen Change Loy

Generalization to out-of-distribution (OOD) data is a capability natural to humans yet challenging for machines to reproduce.

Action Recognition Data Augmentation +8

1,083

Paper
Code

Learning to Generate Novel Domains for Domain Generalization

1 code implementation • ECCV 2020 • Kaiyang Zhou, Yongxin Yang, Timothy Hospedales, Tao Xiang

This explicitly increases the diversity of available training domains and leads to a more generalizable model.

Ranked #65 on Domain Generalization on PACS

Domain Generalization

Paper
Code

Domain Adaptive Ensemble Learning

1 code implementation • 16 Mar 2020 • Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang

Each such classifier is an expert to its own domain and a non-expert to others.

Domain Generalization Ensemble Learning +3

1,083

Paper
Code

Deep Domain-Adversarial Image Generation for Domain Generalisation

no code implementations • 12 Mar 2020 • Kaiyang Zhou, Yongxin Yang, Timothy Hospedales, Tao Xiang

This is achieved by having a learning objective formulated to ensure that the generated data can be correctly classified by the label classifier while fooling the domain classifier.

Ranked #63 on Domain Generalization on PACS

Domain Generalization Image Generation

Paper
Add Code

Torchreid: A Library for Deep Learning Person Re-Identification in Pytorch

8 code implementations • 22 Oct 2019 • Kaiyang Zhou, Tao Xiang

Person re-identification (re-ID), which aims to re-identify people across different camera views, has been significantly advanced by deep learning in recent years, particularly with convolutional neural networks (CNNs).

Benchmarking Person Re-Identification

4,110

Paper
Code

Learning Generalisable Omni-Scale Representations for Person Re-Identification

8 code implementations • 15 Oct 2019 • Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, Tao Xiang

An effective person re-identification (re-ID) model should learn feature representations that are both discriminative, for distinguishing similar-looking people, and generalisable, for deployment across datasets without any adaptation.

Ranked #1 on Unsupervised Person Re-Identification on MSMT17->Market-1501

Unsupervised Domain Adaptation Unsupervised Person Re-Identification

4,110

Paper
Code

Omni-Scale Feature Learning for Person Re-Identification

16 code implementations • ICCV 2019 • Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, Tao Xiang

As an instance-level recognition problem, person re-identification (ReID) relies on discriminative features, which not only capture different spatial scales but also encapsulate an arbitrary combination of multiple scales.

Ranked #2 on Person Re-Identification on MSMT17-C

Person Re-Identification

6,110

Paper
Code

Video Summarisation by Classification with Deep Reinforcement Learning

no code implementations • 9 Jul 2018 • Kaiyang Zhou, Tao Xiang, Andrea Cavallaro

Most existing video summarisation methods are based on either supervised or unsupervised learning.

Classification Decision Making +4

Paper
Add Code

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward

6 code implementations • 29 Dec 2017 • Kaiyang Zhou, Yu Qiao, Tao Xiang

Video summarization aims to facilitate large-scale video browsing by producing short, concise summaries that are diverse and representative of original videos.

Ranked #7 on Unsupervised Video Summarization on TvSum

Decision Making reinforcement-learning +3

455

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.