Search Results for author: Kai Han

Found 122 papers, 81 papers with code

GhostNetV3: Exploring the Training Strategies for Compact Models

no code implementations17 Apr 2024 Zhenhua Liu, Zhiwei Hao, Kai Han, Yehui Tang, Yunhe Wang

In this paper, by systematically investigating the impact of different training ingredients, we introduce a strong training strategy for compact models.

Knowledge Distillation object-detection +1

SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning

no code implementations20 Mar 2024 Hongjun Wang, Sagar Vaze, Kai Han

We thoroughly evaluate our SPTNet on standard benchmarks and demonstrate that our method outperforms existing GCD methods.

SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution

1 code implementation27 Feb 2024 Chengcheng Wang, Zhiwei Hao, Yehui Tang, Jianyuan Guo, Yujie Yang, Kai Han, Yunhe Wang

In this paper, we propose the SAM-DiffSR model, which can utilize the fine-grained structure information from SAM in the process of sampling noise to improve the image quality without additional computational cost during inference.

Image Super-Resolution

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

1 code implementation26 Feb 2024 wei he, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo, Yunhe Wang

Large language models (LLMs) face a daunting challenge due to the excessive computational and memory requirements of the commonly used Transformer architecture.

Assortment Planning with Sponsored Products

no code implementations9 Feb 2024 Shaojie Tang, Shuzhang Cai, Jing Yuan, Kai Han

In the rapidly evolving landscape of retail, assortment planning plays a crucial role in determining the success of a business.

Combinatorial Optimization

Data-efficient Large Vision Models through Sequential Autoregression

1 code implementation7 Feb 2024 Jianyuan Guo, Zhiwei Hao, Chengcheng Wang, Yehui Tang, Han Wu, Han Hu, Kai Han, Chang Xu

Training general-purpose vision models on purely sequential visual data, eschewing linguistic inputs, has heralded a new frontier in visual understanding.

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

1 code implementation6 Feb 2024 Jianyuan Guo, Hanting Chen, Chengcheng Wang, Kai Han, Chang Xu, Yunhe Wang

Recent advancements in large language models have sparked interest in their extraordinary and near-superhuman capabilities, leading researchers to explore methods for evaluating and optimizing these abilities, which is called superalignment.

Few-Shot Learning Knowledge Distillation +1

A Survey on Transformer Compression

no code implementations5 Feb 2024 Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, DaCheng Tao

Model compression methods reduce the memory and computational cost of Transformer, which is a necessary step to implement large language/vision models on practical devices.

Knowledge Distillation Model Compression +1

Rethinking Optimization and Architecture for Tiny Language Models

1 code implementation5 Feb 2024 Yehui Tang, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Liu, Shangling Jui, Kai Han, Yunhe Wang

Several design formulas are empirically proved especially effective for tiny language models, including tokenizer compression, architecture tweaking, parameter inheritance and multiple-round training.

Language Modelling

FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition

no code implementations5 Feb 2024 Xiaohu Huang, Hao Zhou, Kun Yao, Kai Han

To address these issues, FROSTER employs a residual feature distillation approach to ensure that CLIP retains its generalization capability while effectively adapting to the action recognition task.

Open Vocabulary Action Recognition

An Empirical Study of Scaling Law for OCR

1 code implementation29 Dec 2023 Miao Rang, Zhenni Bi, Chuanjian Liu, Yunhe Wang, Kai Han

The laws of model size, data volume, computation and model performance have been extensively studied in the field of Natural Language Processing (NLP).

 Ranked #1 on Scene Text Recognition on ICDAR2013 (using extra training data)

Optical Character Recognition Optical Character Recognition (OCR) +1

PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation

no code implementations27 Dec 2023 Yunhe Wang, Hanting Chen, Yehui Tang, Tianyu Guo, Kai Han, Ying Nie, Xutao Wang, Hailin Hu, Zheyuan Bai, Yun Wang, Fangcheng Liu, Zhicheng Liu, Jianyuan Guo, Sinan Zeng, Yinchen Zhang, Qinghua Xu, Qun Liu, Jun Yao, Chao Xu, DaCheng Tao

We then demonstrate that the proposed approach is significantly effective for enhancing the model nonlinearity through carefully designed ablations; thus, we present a new efficient model architecture for establishing modern, namely, PanGu-$\pi$.

Language Modelling

LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models

no code implementations1 Dec 2023 Ying Nie, wei he, Kai Han, Yehui Tang, Tianyu Guo, Fanyi Du, Yunhe Wang

Moreover, based on the observation that the accuracy of CLIP model does not increase correspondingly as the parameters of text encoder increase, an extra objective of masked language modeling (MLM) is leveraged for maximizing the potential of the shortened text encoder.

Image Classification Language Modelling +3

Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

1 code implementation24 Nov 2023 Jonathan Roberts, Timo Lüddecke, Rehan Sheikh, Kai Han, Samuel Albanie

Multimodal large language models (MLLMs) have shown remarkable capabilities across a broad range of tasks but their knowledge and abilities in the geographic and geospatial domains are yet to be explored, despite potential wide-ranging benefits to navigation, environmental research, urban development, and disaster response.

Disaster Response

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation

1 code implementation NeurIPS 2023 Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang, Chang Xu

To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.

Knowledge Distillation

SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching

no code implementations26 Oct 2023 Xinghui Li, Jingyi Lu, Kai Han, Victor Prisacariu

In this paper, we address the challenge of matching semantically similar keypoints across image pairs.

Boosting Semantic Segmentation from the Perspective of Explicit Class Embeddings

no code implementations ICCV 2023 Yuhe Liu, Chuanjian Liu, Kai Han, Quan Tang, Zengchang Qin

Following this observation, we propose ECENet, a new segmentation paradigm, in which class embeddings are obtained and enhanced explicitly during interacting with multi-stage image features.

Segmentation Semantic Segmentation

Practical Parallel Algorithms for Non-Monotone Submodular Maximization

no code implementations21 Aug 2023 Shuang Cui, Kai Han, Jing Tang, He Huang, Xueying Li, Aakas Zhiyuli, Hanxiao Li

Submodular maximization has found extensive applications in various domains within the field of artificial intelligence, including but not limited to machine learning, computer vision, and natural language processing.

Guide3D: Create 3D Avatars from Text and Image Guidance

no code implementations18 Aug 2023 Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong

To this end, we introduce Guide3D, a zero-shot text-and-image-guided generative model for 3D avatar generation based on diffusion models.

3D Generation Text to 3D +1

Category Feature Transformer for Semantic Segmentation

1 code implementation10 Aug 2023 Quan Tang, Chuanjian Liu, Fagui Liu, Yifan Liu, Jun Jiang, BoWen Zhang, Kai Han, Yunhe Wang

Aggregation of multi-stage features has been revealed to play a significant role in semantic segmentation.

Segmentation Semantic Segmentation

ParameterNet: Parameters Are All You Need

no code implementations26 Jun 2023 Kai Han, Yunhe Wang, Jianyuan Guo, Enhua Wu

In the language domain, LLaMA-1B enhanced with ParameterNet achieves 2\% higher accuracy over vanilla LLaMA.

GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks?

1 code implementation1 Jun 2023 Ning Ding, Yehui Tang, Zhongqian Fu, Chao Xu, Kai Han, Yunhe Wang

We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations and achieve better performance.

Descriptive Image Classification

ViCo: Plug-and-play Visual Condition for Personalized Text-to-image Generation

1 code implementation1 Jun 2023 Shaozhe Hao, Kai Han, Shihao Zhao, Kwan-Yee K. Wong

Personalized text-to-image generation using diffusion models has recently emerged and garnered significant interest.

Text-to-Image Generation

GPT4GEO: How a Language Model Sees the World's Geography

no code implementations30 May 2023 Jonathan Roberts, Timo Lüddecke, Sowmen Das, Kai Han, Samuel Albanie

Large language models (LLMs) have shown remarkable capabilities across a broad range of tasks involving question answering and the generation of coherent text and code.

Disaster Response Language Modelling +2

VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale

1 code implementation25 May 2023 Zhiwei Hao, Jianyuan Guo, Kai Han, Han Hu, Chang Xu, Yunhe Wang

The tremendous success of large models trained on extensive datasets demonstrates that scale is a key ingredient in achieving superior results.

Data Augmentation Knowledge Distillation

Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery

1 code implementation ICCV 2023 Bingchen Zhao, Xin Wen, Kai Han

In this paper, we address the problem of generalized category discovery (GCD), \ie, given a set of images where part of them are labelled and the rest are not, the task is to automatically cluster the images in the unlabelled data, leveraging the information from the labelled data, while the unlabelled data contain images from the labelled classes and also new ones.

Contrastive Learning Image Classification +2

SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models

no code implementations23 Apr 2023 Jonathan Roberts, Kai Han, Samuel Albanie

In this work, we introduce SATellite ImageNet (SATIN), a metadataset curated from 27 existing remotely sensed datasets, and comprehensively evaluate the zero-shot transfer classification capabilities of a broad range of vision-language (VL) models on SATIN.

Classification Image Classification

CiPR: An Efficient Framework with Cross-instance Positive Relations for Generalized Category Discovery

1 code implementation14 Apr 2023 Shaozhe Hao, Kai Han, Kwan-Yee K. Wong

GCD considers the open-world problem of automatically clustering a partially labelled dataset, in which the unlabelled data may contain instances from both novel categories and labelled classes.

Clustering Contrastive Learning +1

What's in a Name? Beyond Class Indices for Image Recognition

no code implementations5 Apr 2023 Kai Han, Yandong Li, Sagar Vaze, Jie Li, Xuhui Jia

In this paper, we reconsider the recognition problem and task a vision-language model to assign class names to images given only a large and essentially unconstrained vocabulary of categories as prior information.

Language Modelling Object Recognition

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

1 code implementation3 Apr 2023 Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong

We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses.

Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

1 code implementation ICCV 2023 Cong Han, Yujie Zhong, Dengjie Li, Kai Han, Lin Ma

Recently, the open-vocabulary semantic segmentation problem has attracted increasing attention and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pretrained visual-language model.

Classification Language Modelling +3

SeSDF: Self-evolved Signed Distance Field for Implicit 3D Clothed Human Reconstruction

no code implementations CVPR 2023 Yukang Cao, Kai Han, Kwan-Yee K. Wong

We propose a flexible framework which, by leveraging the parametric SMPL-X model, can take an arbitrary number of input images to reconstruct a clothed human model under an uncalibrated setting.

Masked Image Modeling with Local Multi-Scale Reconstruction

1 code implementation CVPR 2023 Haoqing Wang, Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhi-Hong Deng, Kai Han

The lower layers are not explicitly guided and the interaction among their patches is only used for calculating new activations.

Representation Learning

Network Expansion for Practical Training Acceleration

1 code implementation CVPR 2023 Ning Ding, Yehui Tang, Kai Han, Chao Xu, Yunhe Wang

Recently, the sizes of deep neural networks and training datasets both increase drastically to pursue better performance in a practical sense.

Redistribution of Weights and Activations for AdderNet Quantization

no code implementations20 Dec 2022 Ying Nie, Kai Han, Haikang Diao, Chuanjian Liu, Enhua Wu, Yunhe Wang

To this end, we first thoroughly analyze the difference on distributions of weights and activations in AdderNet and then propose a new quantization algorithm by redistributing the weights and the activations.

Quantization

FastMIM: Expediting Masked Image Modeling Pre-training for Vision

1 code implementation13 Dec 2022 Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Yunhe Wang, Chang Xu

This paper presents FastMIM, a simple and generic framework for expediting masked image modeling with the following two steps: (i) pre-training vision backbones with low-resolution input images; and (ii) reconstructing Histograms of Oriented Gradients (HOG) feature instead of original RGB values of the input images.

GhostNetV2: Enhance Cheap Operation with Long-Range Attention

15 code implementations23 Nov 2022 Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chao Xu, Yunhe Wang

The convolutional operation can only capture local information in a window region, which prevents performance from being further improved.

Novel Class Discovery without Forgetting

no code implementations21 Jul 2022 K J Joseph, Sujoy Paul, Gaurav Aggarwal, Soma Biswas, Piyush Rai, Kai Han, Vineeth N Balasubramanian

Inspired by this, we identify and formulate a new, pragmatic problem setting of NCDwF: Novel Class Discovery without Forgetting, which tasks a machine learning model to incrementally discover novel categories of instances from unlabeled data, while maintaining its performance on the previously seen categories.

Novel Class Discovery

Network Amplification With Efficient MACs Allocation

2 code implementations Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2022 Chuanjian Liu, Kai Han, An Xiao, Ying Nie, Wei zhang, Yunhe Wang

In particular, the proposed method is used to enlarge models sourced by GhostNet, we achieve state-of-the-art 80. 9% and 84. 3% ImageNet top-1 accuracies under the setting of 600M and 4. 4B MACs, respectively.

Vision GNN: An Image is Worth Graph of Nodes

11 code implementations1 Jun 2022 Kai Han, Yunhe Wang, Jianyuan Guo, Yehui Tang, Enhua Wu

In this paper, we propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks.

Image Classification Object Detection

Spacing Loss for Discovering Novel Categories

1 code implementation22 Apr 2022 K J Joseph, Sujoy Paul, Gaurav Aggarwal, Soma Biswas, Piyush Rai, Kai Han, Vineeth N Balasubramanian

Novel Class Discovery (NCD) is a learning paradigm, where a machine learning model is tasked to semantically group instances from unlabeled data, by utilizing labeled instances from a disjoint set of classes.

Novel Class Discovery

JIFF: Jointly-aligned Implicit Face Function for High Quality Single View Clothed Human Reconstruction

no code implementations CVPR 2022 Yukang Cao, GuanYing Chen, Kai Han, Wenqi Yang, Kwan-Yee K. Wong

In this paper, we focus on improving the quality of face in the reconstruction and propose a novel Jointly-aligned Implicit Face Function (JIFF) that combines the merits of the implicit function based approach and model based approach.

3D Human Reconstruction Face Model +1

SharpContour: A Contour-based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation

no code implementations CVPR 2022 Chenming Zhu, Xuanye Zhang, Yanran Li, Liangdong Qiu, Kai Han, Xiaoguang Han

Contour-based models are efficient and generic to be incorporated with any existing segmentation methods, but they often generate over-smoothed contour and tend to fail on corner areas.

Instance Segmentation Segmentation +1

GhostNets on Heterogeneous Devices via Cheap Operations

8 code implementations10 Jan 2022 Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chunjing Xu, Enhua Wu, Qi Tian

The proposed C-Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks.

PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture

1 code implementation4 Jan 2022 Kai Han, Jianyuan Guo, Yehui Tang, Yunhe Wang

We hope this new baseline will be helpful to the further research and application of vision transformer.

Instance-Aware Dynamic Neural Network Quantization

4 code implementations CVPR 2022 Zhenhua Liu, Yunhe Wang, Kai Han, Siwei Ma, Wen Gao

However, natural images are of huge diversity with abundant content and using such a universal quantization configuration for all samples is not an optimal strategy.

Quantization

An Image Patch is a Wave: Phase-Aware Vision MLP

10 code implementations CVPR 2022 Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Yanxi Li, Chao Xu, Yunhe Wang

To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase.

Image Classification object-detection +2

Open-Set Recognition: a Good Closed-Set Classifier is All You Need?

2 code implementations ICLR 2022 Sagar Vaze, Kai Han, Andrea Vedaldi, Andrew Zisserman

In this paper, we first demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.

Open Set Learning Out-of-Distribution Detection

Symmetry-Enhanced Attention Network for Acute Ischemic Infarct Segmentation with Non-Contrast CT Images

1 code implementation11 Oct 2021 Kongming Liang, Kai Han, Xiuli Li, Xiaoqing Cheng, Yiming Li, Yizhou Wang, Yizhou Yu

In this paper, we propose a symmetry enhanced attention network (SEAN) for acute ischemic infarct segmentation.

Learning Versatile Convolution Filters for Efficient Visual Recognition

no code implementations20 Sep 2021 Kai Han, Yunhe Wang, Chang Xu, Chunjing Xu, Enhua Wu, DaCheng Tao

A series of secondary filters can be derived from a primary filter with the help of binary masks.

Hire-MLP: Vision MLP via Hierarchical Rearrangement

10 code implementations CVPR 2022 Jianyuan Guo, Yehui Tang, Kai Han, Xinghao Chen, Han Wu, Chao Xu, Chang Xu, Yunhe Wang

Previous vision MLPs such as MLP-Mixer and ResMLP accept linearly flattened image patches as input, making them inflexible for different input sizes and hard to capture spatial information.

Image Classification object-detection +2

Greedy Network Enlarging

1 code implementation31 Jul 2021 Chuanjian Liu, Kai Han, An Xiao, Yiping Deng, Wei zhang, Chunjing Xu, Yunhe Wang

Recent studies on deep convolutional neural networks present a simple paradigm of architecture design, i. e., models with more MACs typically achieve better accuracy, such as EfficientNet and RegNet.

Real-time Keypoints Detection for Autonomous Recovery of the Unmanned Ground Vehicle

no code implementations27 Jul 2021 Jie Li, Sheng Zhang, Kai Han, Xia Yuan, Chunxia Zhao, Yu Liu

UGV-KPNet is computationally efficient with a small number of parameters and provides pixel-level accurate keypoints detection results in real-time.

Keypoint Detection

CMT: Convolutional Neural Networks Meet Vision Transformers

14 code implementations CVPR 2022 Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Xinghao Chen, Yunhe Wang, Chang Xu

Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image.

Novel Visual Category Discovery with Dual Ranking Statistics and Mutual Knowledge Distillation

no code implementations NeurIPS 2021 Bingchen Zhao, Kai Han

In this paper, we tackle the problem of novel visual category discovery, i. e., grouping unlabelled images from new classes into different semantic partitions by leveraging a labelled dataset that contains images from other different but relevant categories.

Fine-Grained Visual Recognition Knowledge Distillation

Augmented Shortcuts for Vision Transformers

4 code implementations NeurIPS 2021 Yehui Tang, Kai Han, Chang Xu, An Xiao, Yiping Deng, Chao Xu, Yunhe Wang

Transformer models have achieved great progress on computer vision tasks recently.

AutoNovel: Automatically Discovering and Learning Novel Visual Categories

1 code implementation29 Jun 2021 Kai Han, Sylvestre-Alvise Rebuffi, Sébastien Ehrhardt, Andrea Vedaldi, Andrew Zisserman

We present a new approach called AutoNovel to address this problem by combining three ideas: (1) we suggest that the common approach of bootstrapping an image representation using the labelled data only introduces an unwanted bias, and that this can be avoided by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) we use ranking statistics to transfer the model's knowledge of the labelled classes to the problem of clustering the unlabelled images; and, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled data.

Clustering Image Clustering +2

Post-Training Quantization for Vision Transformer

no code implementations NeurIPS 2021 Zhenhua Liu, Yunhe Wang, Kai Han, Siwei Ma, Wen Gao

Recently, transformer has achieved remarkable performance on a variety of computer vision applications.

Quantization

Positive-Unlabeled Data Purification in the Wild for Object Detection

no code implementations CVPR 2021 Jianyuan Guo, Kai Han, Han Wu, Chao Zhang, Xinghao Chen, Chunjing Xu, Chang Xu, Yunhe Wang

In this paper, we present a positive-unlabeled learning based scheme to expand training data by purifying valuable images from massive unlabeled ones, where the original training data are viewed as positive data and the unlabeled images in the wild are unlabeled data.

Knowledge Distillation object-detection +1

ReNAS: Relativistic Evaluation of Neural Architecture Search

7 code implementations CVPR 2021 Yixing Xu, Yunhe Wang, Kai Han, Yehui Tang, Shangling Jui, Chunjing Xu, Chang Xu

An effective and efficient architecture performance evaluation scheme is essential for the success of Neural Architecture Search (NAS).

Neural Architecture Search

Patch Slimming for Efficient Vision Transformers

no code implementations CVPR 2022 Yehui Tang, Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chao Xu, DaCheng Tao

We first identify the effective patches in the last layer and then use them to guide the patch selection process of previous layers.

Efficient ViTs

Dynamic Resolution Network

3 code implementations NeurIPS 2021 Mingjian Zhu, Kai Han, Enhua Wu, Qiulin Zhang, Ying Nie, Zhenzhong Lan, Yunhe Wang

To this end, we propose a novel dynamic-resolution network (DRNet) in which the input resolution is determined dynamically based on each input sample.

Dense Reconstruction of Transparent Objects by Altering Incident Light Paths Through Refraction

no code implementations20 May 2021 Kai Han, Kwan-Yee K. Wong, Miaomiao Liu

We present a simple setup that allows us to alter the incident light paths before light rays enter the object by immersing the object partially in a liquid, and develop a method for recovering the object surface through reconstructing and triangulating such incident light paths.

Object Surface Reconstruction +1

Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data

no code implementations ICCV 2021 Xuhui Jia, Kai Han, Yukun Zhu, Bradley Green

This paper studies the problem of novel category discovery on single- and multi-modal data with labels from different but relevant categories.

Contrastive Learning Representation Learning

Vision Transformer Pruning

2 code implementations17 Apr 2021 Mingjian Zhu, Yehui Tang, Kai Han

Vision transformer has achieved competitive performance on a variety of computer vision applications.

Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification

no code implementations CVPR 2021 Peng Wang, Kai Han, Xiu-Shen Wei, Lei Zhang, Lei Wang

Learning discriminative image representations plays a vital role in long-tailed image classification because it can ease the classifier learning in imbalanced cases.

Classification Contrastive Learning +4

Distilling Object Detectors via Decoupled Features

1 code implementation CVPR 2021 Jianyuan Guo, Kai Han, Yunhe Wang, Han Wu, Xinghao Chen, Chunjing Xu, Chang Xu

To this end, we present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector.

Image Classification Knowledge Distillation +3

Learning Frequency Domain Approximation for Binary Neural Networks

3 code implementations NeurIPS 2021 Yixing Xu, Kai Han, Chang Xu, Yehui Tang, Chunjing Xu, Yunhe Wang

Binary neural networks (BNNs) represent original full-precision weights and activations into 1-bit with sign function.

Transformer in Transformer

12 code implementations NeurIPS 2021 Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, Yunhe Wang

In this paper, we point out that the attention inside these local patches are also essential for building visual transformers with high performance and we explore a new architecture, namely, Transformer iN Transformer (TNT).

Fine-Grained Image Classification Sentence

AdderNet and its Minimalist Hardware Design for Energy-Efficient Artificial Intelligence

no code implementations25 Jan 2021 Yunhe Wang, Mingqiang Huang, Kai Han, Hanting Chen, Wei zhang, Chunjing Xu, DaCheng Tao

With a comprehensive comparison on the performance, power consumption, hardware resource consumption and network generalization capability, we conclude the AdderNet is able to surpass all the other competitors including the classical CNN, novel memristor-network, XNOR-Net and the shift-kernel based network, indicating its great potential in future high performance and energy-efficient artificial intelligence applications.

Quantization

Fixed Viewpoint Mirror Surface Reconstruction under an Uncalibrated Camera

1 code implementation23 Jan 2021 Kai Han, Miaomiao Liu, Dirk Schnieders, Kwan-Yee K. Wong

This paper addresses the problem of mirror surface reconstruction, and proposes a solution based on observing the reflections of a moving reference plane on the mirror surface.

Surface Reconstruction

GhostSR: Learning Ghost Features for Efficient Image Super-Resolution

4 code implementations21 Jan 2021 Ying Nie, Kai Han, Zhenhua Liu, Chuanjian Liu, Yunhe Wang

Based on the observation that many features in SISR models are also similar to each other, we propose to use shift operation to generate the redundant features (i. e., ghost features).

Image Super-Resolution

A Flexible Framework for Discovering Novel Categories with Contrastive Learning

no code implementations1 Jan 2021 Xuhui Jia, Kai Han, Yukun Zhu, Bradley Green

This paper studies the problem of novel category discovery on single- and multi-modal data with labels from different but relevant categories.

Contrastive Learning Representation Learning

A Survey on Visual Transformer

no code implementations23 Dec 2020 Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, DaCheng Tao

Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism.

Image Classification Inductive Bias

$\mathbb{X}$Resolution Correspondence Networks

1 code implementation17 Dec 2020 Georgi Tinchev, Shuda Li, Kai Han, David Mitchell, Rigas Kouskouridas

In this paper, we aim at establishing accurate dense correspondences between a pair of images with overlapping field of view under challenging illumination variation, viewpoint changes, and style differences.

4k

Model Rubik’s Cube: Twisting Resolution, Depth and Width for TinyNets

3 code implementations NeurIPS 2020 Kai Han, Yunhe Wang, Qiulin Zhang, Wei zhang, Chunjing Xu, Tong Zhang

To this end, we summarize a tiny formula for downsizing neural architectures through a series of smaller models derived from the EfficientNet-B0 with the FLOPs constraint.

Image Classification

Dynamic Feature Pyramid Networks for Object Detection

1 code implementation1 Dec 2020 Mingjian Zhu, Kai Han, Changbin Yu, Yunhe Wang

An attempt to enhance the FPN is enriching the spatial information by expanding the receptive fields, which is promising to largely improve the detection accuracy.

Object object-detection +1

Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets

10 code implementations28 Oct 2020 Kai Han, Yunhe Wang, Qiulin Zhang, Wei zhang, Chunjing Xu, Tong Zhang

To this end, we summarize a tiny formula for downsizing neural architectures through a series of smaller models derived from the EfficientNet-B0 with the FLOPs constraint.

Image Classification Rubik's Cube

Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time

no code implementations NeurIPS 2020 Kai Han, Zongmai Cao, Shuang Cui, Benwei Wu

We study the problem of maximizing a non-monotone, non-negative submodular function subject to a matroid constraint.

Revisiting Modified Greedy Algorithm for Monotone Submodular Maximization with a Knapsack Constraint

no code implementations12 Aug 2020 Jing Tang, Xueyan Tang, Andrew Lim, Kai Han, Chongshou Li, Junsong Yuan

Second, we enhance the modified greedy algorithm to derive a data-dependent upper bound on the optimum.

Deep Photometric Stereo for Non-Lambertian Surfaces

1 code implementation26 Jul 2020 Guan-Ying Chen, Kai Han, Boxin Shi, Yasuyuki Matsushita, Kwan-Yee K. Wong

To deal with the uncalibrated scenario where light directions are unknown, we introduce a new convolutional network, named LCNet, to estimate light directions from input images.

Dual-Resolution Correspondence Networks

1 code implementation NeurIPS 2020 Xinghui Li, Kai Han, Shuda Li, Victor Adrian Prisacariu

The fine-resolution feature maps are used to obtain the final dense correspondences guided by the refined coarse 4D correlation tensor.

Efficient Approximation Algorithms for Adaptive Influence Maximization

2 code implementations14 Apr 2020 Keke Huang, Jing Tang, Kai Han, Xiaokui Xiao, Wei Chen, Aixin Sun, Xueyan Tang, Andrew Lim

In this paper, we propose the first practical algorithm for the adaptive IM problem that could provide the worst-case approximation guarantee of $1-\mathrm{e}^{\rho_b(\varepsilon-1)}$, where $\rho_b=1-(1-1/b)^b$ and $\varepsilon \in (0, 1)$ is a user-specified parameter.

Social and Information Networks

Anisotropic Convolutional Networks for 3D Semantic Scene Completion

1 code implementation CVPR 2020 Jie Li, Kai Han, Peng Wang, Yu Liu, Xia Yuan

In contrast to the standard 3D convolution that is limited to a fixed 3D receptive field, our module is capable of modeling the dimensional anisotropy voxel-wisely.

3D Semantic Scene Completion from a single RGB image

Learning Inverse Rendering of Faces from Real-world Videos

1 code implementation26 Mar 2020 Yuda Qiu, Zhangyang Xiong, Kai Han, Zhongyuan Wang, Zixiang Xiong, Xiaoguang Han

To alleviate this problem, we propose a weakly supervised training approach to train our model on real face videos, based on the assumption of consistency of albedo and normal across different frames, thus bridging the gap between real and synthetic face images.

Inverse Rendering

Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection

1 code implementation CVPR 2020 Jianyuan Guo, Kai Han, Yunhe Wang, Chao Zhang, Zhaohui Yang, Han Wu, Xinghao Chen, Chang Xu

To this end, we propose a hierarchical trinity search framework to simultaneously discover efficient architectures for all components (i. e. backbone, neck, and head) of object detector in an end-to-end manner.

Image Classification Neural Architecture Search +3

Automatically Discovering and Learning New Visual Categories with Ranking Statistics

1 code implementation ICLR 2020 Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Andrea Vedaldi, Andrew Zisserman

In this work we address this problem by combining three ideas: (1) we suggest that the common approach of bootstrapping an image representation using the labeled data only introduces an unwanted bias, and that this can be avoided by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) we use rank statistics to transfer the model's knowledge of the labelled classes to the problem of clustering the unlabelled images; and, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled data.

Clustering General Classification +1

Widening and Squeezing: Towards Accurate and Efficient QNNs

no code implementations3 Feb 2020 Chuanjian Liu, Kai Han, Yunhe Wang, Hanting Chen, Qi Tian, Chunjing Xu

Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.

Quantization

GhostNet: More Features from Cheap Operations

34 code implementations CVPR 2020 Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, Chang Xu

Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources.

Image Classification

Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification

1 code implementation ICCV 2019 Jianyuan Guo, Yuhui Yuan, Lang Huang, Chao Zhang, Jinge Yao, Kai Han

On the other hand, there still exist many useful contextual cues that do not fall into the scope of predefined human parts or attributes.

Human Parsing Person Re-Identification

ReNAS:Relativistic Evaluation of Neural Architecture Search

4 code implementations30 Sep 2019 Yixing Xu, Yunhe Wang, Kai Han, Yehui Tang, Shangling Jui, Chunjing Xu, Chang Xu

An effective and efficient architecture performance evaluation scheme is essential for the success of Neural Architecture Search (NAS).

Neural Architecture Search

Balanced Binary Neural Networks with Gated Residual

1 code implementation26 Sep 2019 Mingzhu Shen, Xianglong Liu, Ruihao Gong, Kai Han

In this paper, we attempt to maintain the information propagated in the forward process and propose a Balanced Binary Neural Networks with Gated Residual (BBG for short).

Binarization General Classification +1

Positive-Unlabeled Compression on the Cloud

2 code implementations NeurIPS 2019 Yixing Xu, Yunhe Wang, Hanting Chen, Kai Han, Chunjing Xu, DaCheng Tao, Chang Xu

In practice, only a small portion of the original training set is required as positive examples and more useful training examples can be obtained from the massive unlabeled data on the cloud through a PU classifier with an attention based multi-scale feature extractor.

Knowledge Distillation

Searching for Accurate Binary Neural Architectures

no code implementations16 Sep 2019 Mingzhu Shen, Kai Han, Chunjing Xu, Yunhe Wang

Binary neural networks have attracted tremendous attention due to the efficiency for deploying them on mobile devices.

Full-Stack Filters to Build Minimum Viable CNNs

1 code implementation6 Aug 2019 Kai Han, Yunhe Wang, Yixing Xu, Chunjing Xu, DaCheng Tao, Chang Xu

Existing works used to decrease the number or size of requested convolution filters for a minimum viable CNN on edge devices.

Attribute Aware Pooling for Pedestrian Attribute Recognition

no code implementations27 Jul 2019 Kai Han, Yunhe Wang, Han Shu, Chuanjian Liu, Chunjing Xu, Chang Xu

This paper expands the strength of deep convolutional neural networks (CNNs) to the pedestrian attribute recognition problem by devising a novel attribute aware pooling algorithm.

Attribute Pedestrian Attribute Recognition

Learning Instance-wise Sparsity for Accelerating Deep Models

no code implementations27 Jul 2019 Chuanjian Liu, Yunhe Wang, Kai Han, Chunjing Xu, Chang Xu

Exploring deep convolutional neural networks of high efficiency and low memory usage is very essential for a wide variety of machine learning tasks.

Co-Evolutionary Compression for Unpaired Image Translation

2 code implementations ICCV 2019 Han Shu, Yunhe Wang, Xu Jia, Kai Han, Hanting Chen, Chunjing Xu, Qi Tian, Chang Xu

Generative adversarial networks (GANs) have been successfully used for considerable computer vision tasks, especially the image-to-image translation.

Image-to-Image Translation Translation

Learning Transparent Object Matting

1 code implementation25 Jul 2019 Guan-Ying Chen, Kai Han, Kwan-Yee K. Wong

In this paper, we formulate transparent object matting as a refractive flow estimation problem, and propose a deep learning framework, called TOM-Net, for learning the refractive flow.

Image Matting Object +1

Semi-Supervised Learning with Scarce Annotations

1 code implementation21 May 2019 Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Kai Han, Andrea Vedaldi, Andrew Zisserman

The first is a simple but effective one: we leverage the power of transfer learning among different tasks and self-supervision to initialize a good representation of the data without making use of any label.

Multi-class Classification Self-Supervised Learning +1

Self-calibrating Deep Photometric Stereo Networks

1 code implementation CVPR 2019 Guan-Ying Chen, Kai Han, Boxin Shi, Yasuyuki Matsushita, Kwan-Yee K. Wong

This paper proposes an uncalibrated photometric stereo method for non-Lambertian scenes based on deep learning.

Attribute-Aware Attention Model for Fine-grained Representation Learning

1 code implementation2 Jan 2019 Kai Han, Jianyuan Guo, Chao Zhang, Mingjian Zhu

Based on the considerations above, we propose a novel Attribute-Aware Attention Model ($A^3M$), which can learn local attribute representation and global category representation simultaneously in an end-to-end manner.

Attribute Fine-Grained Image Classification +4

Greedy Hash: Towards Fast Optimization for Accurate Hash Coding in CNN

2 code implementations NeurIPS 2018 Shupeng Su, Chao Zhang, Kai Han, Yonghong Tian

To convert the input into binary code, hashing algorithm has been widely used for approximate nearest neighbor search on large-scale image sets due to its computation and storage efficiency.

Deep Hashing

PS-FCN: A Flexible Learning Framework for Photometric Stereo

1 code implementation ECCV 2018 Guan-Ying Chen, Kai Han, Kwan-Yee K. Wong

This paper addresses the problem of photometric stereo for non-Lambertian surfaces.

AutoEncoder Inspired Unsupervised Feature Selection

1 code implementation23 Oct 2017 Kai Han, Yunhe Wang, Chao Zhang, Chao Li, Chao Xu

High-dimensional data in many areas such as computer vision and machine learning tasks brings in computational and analytical difficulty.

BIG-bench Machine Learning feature selection

SCNet: Learning Semantic Correspondence

1 code implementation ICCV 2017 Kai Han, Rafael S. Rezende, Bumsub Ham, Kwan-Yee K. Wong, Minsu Cho, Cordelia Schmid, Jean Ponce

This paper addresses the problem of establishing semantic correspondences between images depicting different instances of the same object or scene category.

Semantic correspondence

Mirror Surface Reconstruction Under an Uncalibrated Camera

no code implementations CVPR 2016 Kai Han, Kwan-Yee K. Wong, Dirk Schnieders, Miaomiao Liu

Unlike previous approaches which require tedious work to calibrate the camera, our method can recover both the camera intrinsics and extrinsics together with the mirror surface from reflections of the reference plane under at least three unknown distinct poses.

Surface Reconstruction

A Fixed Viewpoint Approach for Dense Reconstruction of Transparent Objects

no code implementations CVPR 2015 Kai Han, Kwan-Yee K. Wong, Miaomiao Liu

In this paper, we develop a fixed viewpoint approach for dense surface reconstruction of transparent objects based on refraction of light.

Object Surface Reconstruction +1

Cannot find the paper you are looking for? You can Submit a new open access paper.