Search Results for author: Sheng Jin

Found 36 papers, 19 papers with code

Weakly Supervised Monocular 3D Detection with a Single-View Image

no code implementations • 29 Feb 2024 • Xueying Jiang, Sheng Jin, Lewei Lu, Xiaoqin Zhang, Shijian Lu

We propose SKD-WM3D, a weakly supervised monocular 3D detection framework that exploits depth information to achieve M3D with a single-view image exclusively without any 3D annotations or other training data.

Object Localization Self-Knowledge Distillation +1

Paper
Add Code

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

no code implementations • 23 Feb 2024 • Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu

Automated machine learning (AutoML) is a collection of techniques designed to automate the machine learning development process.

Hyperparameter Optimization Keypoint Estimation

Paper
Add Code

LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors

no code implementations • 7 Feb 2024 • Sheng Jin, Xueying Jiang, Jiaxing Huang, Lewei Lu, Shijian Lu

This paper presents DVDet, a Descriptor-Enhanced Open Vocabulary Detector that introduces conditional context prompts and hierarchical textual descriptors that enable precise region-text alignment as well as open-vocabulary detection training in general.

Image Classification object-detection +1

Paper
Add Code

CLIM: Contrastive Language-Image Mosaic for Region Representation

1 code implementation • 18 Dec 2023 • Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Wentao Liu, Chen Change Loy

Our experimental results demonstrate that CLIM improves different baseline open-vocabulary object detectors by a large margin on both OV-COCO and OV-LVIS benchmarks.

Ranked #6 on Open Vocabulary Object Detection on LVIS v1.0

Object object-detection +1

Paper
Code

MCFNet: Multi-scale Covariance Feature Fusion Network for Real-time Semantic Segmentation

no code implementations • 12 Dec 2023 • Xiaojie Fang, Xingguo Song, Xiangyin Meng, Xu Fang, Sheng Jin

The low-level spatial detail information and high-level semantic abstract information are both essential to the semantic segmentation task.

Real-Time Semantic Segmentation Segmentation

Paper
Add Code

You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception

no code implementations • 9 Dec 2023 • Sheng Jin, Shuhuai Li, Tong Li, Wentao Liu, Chen Qian, Ping Luo

Human-centric perception (e. g. pedetrian detection, segmentation, pose estimation, and attribute analysis) is a long-standing problem for computer vision.

Attribute Multi-Task Learning +1

Paper
Add Code

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

1 code implementation • 2 Oct 2023 • Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy

However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.

Ranked #3 on Open Vocabulary Semantic Segmentation on PASCAL Context-59

Image Classification Image Segmentation +7

133

Paper
Code

Domain Generalization via Balancing Training Difficulty and Model Capability

no code implementations • ICCV 2023 • Xueying Jiang, Jiaxing Huang, Sheng Jin, Shijian Lu

Despite its recent progress, most existing work suffers from the misalignment between the difficulty level of training samples and the capability of contemporarily trained models, leading to over-fitting or under-fitting in the trained generalization model.

Data Augmentation Domain Generalization

Paper
Add Code

GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition

no code implementations • 28 Aug 2023 • Ruijie Yao, Sheng Jin, Lumin Xu, Wang Zeng, Wentao Liu, Chen Qian, Ping Luo, Ji Wu

Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image while modeling the complex relationships between labels and image regions.

graph construction

Paper
Add Code

Uncertainty-aware Unsupervised Multi-Object Tracking

1 code implementation • ICCV 2023 • Kai Liu, Sheng Jin, Zhihang Fu, Ze Chen, Rongxin Jiang, Jieping Ye

The resulting accurate pseudo-tracklets boost learning the feature consistency.

Multi-Object Tracking Object

Paper
Code

Prompt Ensemble Self-training for Open-Vocabulary Domain Adaptation

no code implementations • 29 Jun 2023 • Jiaxing Huang, Jingyi Zhang, Han Qiu, Sheng Jin, Shijian Lu

Traditional domain adaptation assumes the same vocabulary across source and target domains, which often struggles with limited transfer flexibility and efficiency while handling target domains with different vocabularies.

Unsupervised Domain Adaptation

Paper
Add Code

Vision-Language Models for Vision Tasks: A Survey

1 code implementation • 3 Apr 2023 • Jingyi Zhang, Jiaxing Huang, Sheng Jin, Shijian Lu

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm.

Benchmarking Knowledge Distillation +1

1,742

Paper
Code

Aligning Bag of Regions for Open-Vocabulary Object Detection

1 code implementation • CVPR 2023 • Size Wu, Wenwei Zhang, Sheng Jin, Wentao Liu, Chen Change Loy

The embeddings of regions in a bag are treated as embeddings of words in a sentence, and they are sent to the text encoder of a VLM to obtain the bag-of-regions embedding, which is learned to be aligned to the corresponding features extracted by a frozen VLM.

Ranked #7 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

Object object-detection +2

160

Paper
Code

Reinforcement learning for traffic signal control in hybrid action space

no code implementations • 23 Nov 2022 • Haoqing Luo, Sheng Jin

The prevailing reinforcement-learning-based traffic signal control methods are typically staging-optimizable or duration-optimizable, depending on the action spaces.

Fairness reinforcement-learning +1

Paper
Add Code

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild

1 code implementation • 23 Aug 2022 • Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.

Ranked #2 on 2D Human Pose Estimation on COCO-WholeBody

2D Human Pose Estimation Neural Architecture Search +1

708

Paper
Code

PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation

1 code implementation • 16 Aug 2022 • Wentao Jiang, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Si Liu

Human pose estimation aims to accurately estimate a wide variety of human poses.

Data Augmentation Pose Estimation

Paper
Code

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

1 code implementation • 22 Jul 2022 • Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, Ping Luo

Unlike most previous works that directly predict the 3D poses of two interacting hands simultaneously, we propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.

3D Interacting Hand Pose Estimation Hand Pose Estimation

Paper
Code

Pose for Everything: Towards Category-Agnostic Pose Estimation

1 code implementation • 21 Jul 2022 • Lumin Xu, Sheng Jin, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.

Ranked #4 on 2D Pose Estimation on MP-100

Category-Agnostic Pose Estimation Pose Estimation

182

Paper
Code

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

1 code implementation • CVPR 2022 • Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang

Vision transformers have achieved great successes in many computer vision tasks.

Ranked #4 on 2D Human Pose Estimation on COCO-WholeBody

2D Human Pose Estimation 3D Human Pose Estimation +1

181

Paper
Code

Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization

no code implementations • ICLR 2022 • Can Wang, Sheng Jin, Yingda Guan, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang

PL approaches apply pseudo-labels to unlabeled data, and then train the model with a combination of the labeled and pseudo-labeled data iteratively.

Paper
Add Code

Temporal Action Proposal Generation with Background Constraint

1 code implementation • 15 Dec 2021 • Haosen Yang, Wenhao Wu, Lining Wang, Sheng Jin, Boyang xia, Hongxun Yao, Hujie Huang

To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth.

Temporal Action Proposal Generation

Paper
Code

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images

1 code implementation • ICCV 2021 • Size Wu, Sheng Jin, Wentao Liu, Lei Bai, Chen Qian, Dong Liu, Wanli Ouyang

Following the top-down paradigm, we decompose the task into two stages, i. e. person localization and pose estimation.

Ranked #2 on 3D Multi-Person Pose Estimation on Panoptic (using extra training data)

3D Multi-Person Pose Estimation 3D Pose Estimation +1

Paper
Code

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

4 code implementations • CVPR 2021 • Lumin Xu, Yingda Guan, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang

Human pose estimation has achieved significant progress in recent years.

Ranked #23 on Pose Estimation on COCO test-dev

Neural Architecture Search Pose Estimation

5,006

Paper
Code

When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks

1 code implementation • CVPR 2021 • Jiahang Wang, Sheng Jin, Wentao Liu, Weizhong Liu, Chen Qian, Ping Luo

However, unlike human vision that is robust to various data corruptions such as blur and pixelation, current pose estimators are easily confused by these corruptions.

Knowledge Distillation Pose Estimation

Paper
Code

Relative occurrence rates of terrestrial planets orbiting FGK stars

no code implementations • 11 Feb 2021 • Sheng Jin

Then I fit two exponential decay functions of detection efficiency along with the increase of planetary orbital distance and the decrease of planetary radius.

Earth and Planetary Astrophysics Solar and Stellar Astrophysics

Paper
Add Code

When Counterpoint Meets Chinese Folk Melodies

1 code implementation • NeurIPS 2020 • Nan Jiang, Sheng Jin, Zhiyao Duan, ChangShui Zhang

An interaction reward model is trained on the duets formed from outer parts of Bach chorales to model counterpoint interaction, while a style reward model is trained on monophonic melodies of Chinese folk songs to model melodic patterns.

Paper
Code

Whole-Body Human Pose Estimation in the Wild

2 code implementations • ECCV 2020 • Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo

This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet.

Ranked #8 on 2D Human Pose Estimation on COCO-WholeBody

2D Human Pose Estimation Facial Landmark Detection +2

5,006

Paper
Code

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

no code implementations • ECCV 2020 • Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo

The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.

Ranked #3 on Keypoint Detection on OCHuman

2D Human Pose Estimation Clustering +4

Paper
Add Code

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

no code implementations • 8 Feb 2020 • Nan Jiang, Sheng Jin, Zhiyao Duan, Chang-Shui Zhang

We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state).

Music Generation reinforcement-learning +1

Paper
Add Code

HoMM: Higher-order Moment Matching for Unsupervised Domain Adaptation

1 code implementation • 27 Dec 2019 • Chao Chen, Zhihang Fu, Zhihong Chen, Sheng Jin, Zhaowei Cheng, Xinyu Jin, Xian-Sheng Hua

In particular, our proposed HoMM can perform arbitrary-order moment tensor matching, we show that the first-order HoMM is equivalent to Maximum Mean Discrepancy (MMD) and the second-order HoMM is equivalent to Correlation Alignment (CORAL).

Unsupervised Domain Adaptation

Paper
Code

SSAH: Semi-supervised Adversarial Deep Hashing with Self-paced Hard Sample Generation

no code implementations • 20 Nov 2019 • Sheng Jin, Shangchen Zhou, Yao Liu, Chao Chen, Xiaoshuai Sun, Hongxun Yao, Xian-Sheng Hua

In this paper, we propose a novel Semi-supervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework.

Deep Hashing Generative Adversarial Network

Paper
Add Code

TRB: A Novel Triplet Representation for Understanding 2D Human Body

2 code implementations • ICCV 2019 • Haodong Duan, Kwan-Yee Lin, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang

In this paper, we propose the Triplet Representation for Body (TRB) -- a compact 2D human body representation, with skeleton keypoints capturing human pose information and contour keypoints containing human shape information.

Conditional Image Generation Open-Ended Question Answering

5,006

Paper
Code

Multi-person Articulated Tracking with Spatial and Temporal Embeddings

no code implementations • CVPR 2019 • Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian

Our framework consists of two main components,~\ie~SpatialNet and TemporalNet.

Multi-Object Tracking Multi-Person Pose Estimation +2

Paper
Add Code

Connectionist Temporal Classification with Maximum Entropy Regularization

1 code implementation • NeurIPS 2018 • Hu Liu, Sheng Jin, Chang-Shui Zhang

Connectionist Temporal Classification (CTC) is an objective function for end-to-end sequence learning, which adopts dynamic programming algorithms to directly learn the mapping between sequences.

Classification General Classification +3