Search Results for author: Bohan Zhuang

Found 59 papers, 31 papers with code

LongVLM: Efficient Long Video Understanding via Large Language Models

1 code implementation4 Apr 2024 Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang

In this way, we encode video representations that incorporate both local and global information, enabling the LLM to generate comprehensive responses for long-term videos.

Question Answering Video Question Answering +1

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

1 code implementation12 Mar 2024 Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang

Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains challenging.

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

1 code implementation21 Feb 2024 Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar

Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model.

Image Generation

ModaVerse: Efficiently Transforming Modalities with LLMs

1 code implementation12 Jan 2024 Xinyu Wang, Bohan Zhuang, Qi Wu

This alignment process, which synchronizes a language model trained on textual data with encoders and decoders trained on multi-modal data, often necessitates extensive training of several projection layers in multiple stages.

Language Modelling Large Language Model

Efficient Stitchable Task Adaptation

1 code implementation29 Nov 2023 Haoyu He, Zizheng Pan, Jing Liu, Jianfei Cai, Bohan Zhuang

In this work, we present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints.

Chatbot

Efficient Test-Time Adaptation for Super-Resolution with Second-Order Degradation and Reconstruction

1 code implementation NeurIPS 2023 Zeshuai Deng, Zhuokun Chen, Shuaicheng Niu, Thomas H. Li, Bohan Zhuang, Mingkui Tan

Then, we adapt the SR model by implementing feature-level reconstruction learning from the initial test image to its second-order degraded counterparts, which helps the SR model generate plausible HR images.

Image Super-Resolution Test-time Adaptation

Mask Propagation for Efficient Video Semantic Segmentation

1 code implementation NeurIPS 2023 Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang, Bohan Zhuang

By reusing predictions from key frames, we circumvent the need to process a large volume of video frames individually with resource-intensive segmentors, alleviating temporal redundancy and significantly reducing computational costs.

Semantic Segmentation Video Semantic Segmentation

Object-aware Inversion and Reassembly for Image Editing

no code implementations18 Oct 2023 Zhen Yang, Ganggui Ding, Wen Wang, Hao Chen, Bohan Zhuang, Chunhua Shen

Subsequently, we propose an additional reassembly step to seamlessly integrate the respective editing results and the non-editing region to obtain the final edited image.

Benchmarking Denoising +1

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

2 code implementations12 Oct 2023 Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang

Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly.

Quantization

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

1 code implementation5 Oct 2023 Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

In this paper, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.

Denoising Image Generation +1

SwitchGPT: Adapting Large Language Models for Non-Text Outputs

no code implementations14 Sep 2023 Xinyu Wang, Bohan Zhuang, Qi Wu

To bridge this gap, we propose a novel approach, \methodname, from a modality conversion perspective that evolves a text-based LLM into a multi-modal one.

Stitched ViTs are Flexible Vision Backbones

1 code implementation30 Jun 2023 Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang

With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K and NYUv2, SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone, achieving great advantages in both training efficiency and deployment flexibility.

LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

no code implementations28 May 2023 Mingyang Zhang, Hao Chen, Chunhua Shen, Zhen Yang, Linlin Ou, Xinyi Yu, Bohan Zhuang

This is due to their utilization of unstructured pruning on LPMs, impeding the merging of LoRA weights, or their dependence on the gradients of pre-trained weights to guide pruning, which can impose significant memory overhead.

Model Compression Network Pruning

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

1 code implementation ICCV 2023 Haoyu He, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang

Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty.

Stitchable Neural Networks

2 code implementations CVPR 2023 Zizheng Pan, Jianfei Cai, Bohan Zhuang

As each model family consists of pretrained models with diverse scales (e. g., DeiT-Ti/S/B), it naturally arises a fundamental question of how to efficiently assemble these readily available models in a family for dynamic accuracy-efficiency trade-offs at runtime.

Image Classification

A Survey on Efficient Training of Transformers

no code implementations2 Feb 2023 Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen

Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources.

BiViT: Extremely Compressed Binary Vision Transformers

no code implementations ICCV 2023 Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

BiViT: Extremely Compressed Binary Vision Transformer

no code implementations14 Nov 2022 Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

EcoFormer: Energy-Saving Attention with Linear Complexity

1 code implementation19 Sep 2022 Jing Liu, Zizheng Pan, Haoyu He, Jianfei Cai, Bohan Zhuang

To this end, we propose a new binarization paradigm customized to high-dimensional softmax attention via kernelized hashing, called EcoFormer, to map the original queries and keys into low-dimensional binary codes in Hamming space.

Binarization

FocusFormer: Focusing on What We Need via Architecture Sampler

no code implementations23 Aug 2022 Jing Liu, Jianfei Cai, Bohan Zhuang

During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment.

Neural Architecture Search

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

no code implementations21 Jul 2022 Yuetian Weng, Zizheng Pan, Mingfei Han, Xiaojun Chang, Bohan Zhuang

The task of action detection aims at deducing both the action category and localization of the start and end moment for each action instance in a long, untrimmed video.

Action Detection Video Understanding

Fast Vision Transformers with HiLo Attention

5 code implementations26 May 2022 Zizheng Pan, Jianfei Cai, Bohan Zhuang

Therefore, we propose to disentangle the high/low frequency patterns in an attention layer by separating the heads into two groups, where one group encodes high frequencies via self-attention within each local window, and another group encodes low frequencies by performing global attention between the average-pooled low-frequency keys and values from each window and each query position in the input feature map.

Benchmarking Efficient ViTs +2

Dynamic Focus-aware Positional Queries for Semantic Segmentation

2 code implementations CVPR 2023 Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang

In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.

Semantic Segmentation

Automated Progressive Learning for Efficient Training of Vision Transformers

1 code implementation CVPR 2022 Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun Chang, Yi Yang

First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth.

Sharpness-aware Quantization for Deep Neural Networks

3 code implementations24 Nov 2021 Jing Liu, Jianfei Cai, Bohan Zhuang

However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance.

Image Classification Model Compression +1

Mesa: A Memory-saving Training Framework for Transformers

3 code implementations22 Nov 2021 Zizheng Pan, Peng Chen, Haoyu He, Jing Liu, Jianfei Cai, Bohan Zhuang

While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all intermediate activations that are needed for gradient computation during backpropagation, especially for long sequences.

Quantization

Less is More: Pay Less Attention in Vision Transformers

2 code implementations29 May 2021 Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu, Jianfei Cai

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision.

Image Classification Instance Segmentation +3

End-to-end One-shot Human Parsing

1 code implementation4 May 2021 Haoyu He, Bohan Zhuang, Jing Zhang, Jianfei Cai, DaCheng Tao

To address three main challenges in OSHP, i. e., small sizes, testing bias, and similar parts, we devise an End-to-end One-shot human Parsing Network (EOP-Net).

Human Parsing Metric Learning +1

Scalable Vision Transformers with Hierarchical Pooling

2 code implementations ICCV 2021 Zizheng Pan, Bohan Zhuang, Jing Liu, Haoyu He, Jianfei Cai

However, the routine of the current ViT model is to maintain a full-length patch sequence during inference, which is redundant and lacks hierarchical representation.

Efficient ViTs

Single-path Bit Sharing for Automatic Loss-aware Model Compression

no code implementations13 Jan 2021 Jing Liu, Bohan Zhuang, Peng Chen, Chunhua Shen, Jianfei Cai, Mingkui Tan

By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined.

Model Compression Network Pruning +1

Fully Quantized Image Super-Resolution Networks

1 code implementation29 Nov 2020 Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen

With the rising popularity of intelligent mobile devices, it is of great practical significance to develop accurate, realtime and energy-efficient image Super-Resolution (SR) inference methods.

Image Super-Resolution Quantization

FATNN: Fast and Accurate Ternary Neural Networks

no code implementations ICCV 2021 Peng Chen, Bohan Zhuang, Chunhua Shen

Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.

Image Classification Quantization

AQD: Towards Accurate Fully-Quantized Object Detection

1 code implementation CVPR 2021 Peng Chen, Jing Liu, Bohan Zhuang, Mingkui Tan, Chunhua Shen

Network quantization allows inference to be conducted using low-precision arithmetic for improved inference efficiency of deep neural networks on edge devices.

Image Classification Object +3

Role-Wise Data Augmentation for Knowledge Distillation

1 code implementation ICLR 2020 Jie Fu, Xue Geng, Zhijian Duan, Bohan Zhuang, Xingdi Yuan, Adam Trischler, Jie Lin, Chris Pal, Hao Dong

To our knowledge, existing methods overlook the fact that although the student absorbs extra knowledge from the teacher, both models share the same input data -- and this data is the only medium by which the teacher's knowledge can be demonstrated.

Data Augmentation Knowledge Distillation

Generative Low-bitwidth Data Free Quantization

3 code implementations ECCV 2020 Shoukai Xu, Haokun Li, Bohan Zhuang, Jing Liu, JieZhang Cao, Chuangrun Liang, Mingkui Tan

More critically, our method achieves much higher accuracy on 4-bit quantization than the existing data free quantization method.

Data Free Quantization

Switchable Precision Neural Networks

no code implementations7 Feb 2020 Luis Guerra, Bohan Zhuang, Ian Reid, Tom Drummond

Instantaneous and on demand accuracy-efficiency trade-off has been recently explored in the context of neural networks slimming.

Quantization

Automatic Pruning for Quantized Neural Networks

no code implementations3 Feb 2020 Luis Guerra, Bohan Zhuang, Ian Reid, Tom Drummond

In particular, for ResNet-18 on ImageNet, we prune 26. 12% of the model size with Binarized Neural Network quantization, achieving a top-1 classification accuracy of 47. 32% in a model of 2. 47 MB and 59. 30% with a 2-bit DoReFa-Net in 4. 36 MB.

Bayesian Optimization Quantization

Discrimination-aware Network Pruning for Deep Model Compression

1 code implementation4 Jan 2020 Jing Liu, Bohan Zhuang, Zhuangwei Zhuang, Yong Guo, Junzhou Huang, Jinhui Zhu, Mingkui Tan

In this paper, we propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power.

Face Recognition Image Classification +2

Structured Binary Neural Networks for Image Recognition

no code implementations22 Sep 2019 Bohan Zhuang, Chunhua Shen, Mingkui Tan, Peng Chen, Lingqiao Liu, Ian Reid

Experiments on both classification, semantic segmentation and object detection tasks demonstrate the superior performance of the proposed methods over various quantized networks in the literature.

object-detection Object Detection +2

Auxiliary Learning for Deep Multi-task Learning

no code implementations5 Sep 2019 Yifan Liu, Bohan Zhuang, Chunhua Shen, Hao Chen, Wei Yin

The most current methods can be categorized as either: (i) hard parameter sharing where a subset of the parameters is shared among tasks while other parameters are task-specific; or (ii) soft parameter sharing where all parameters are task-specific but they are jointly regularized.

Auxiliary Learning Depth Estimation +3

RBCN: Rectified Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs

no code implementations21 Aug 2019 Chunlei Liu, Wenrui Ding, Xin Xia, Yuan Hu, Baochang Zhang, Jianzhuang Liu, Bohan Zhuang, Guodong Guo

Binarized convolutional neural networks (BCNNs) are widely used to improve memory and computation efficiency of deep convolutional neural networks (DCNNs) for mobile and AI chips based applications.

Binarization Object Tracking

Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations

no code implementations10 Aug 2019 Bohan Zhuang, Jing Liu, Mingkui Tan, Lingqiao Liu, Ian Reid, Chunhua Shen

Furthermore, we propose a second progressive quantization scheme which gradually decreases the bit-width from high-precision to low-precision during training.

Knowledge Distillation Quantization

Training Quantized Neural Networks with a Full-precision Auxiliary Module

no code implementations CVPR 2020 Bohan Zhuang, Lingqiao Liu, Mingkui Tan, Chunhua Shen, Ian Reid

In this paper, we seek to tackle a challenge in training low-precision networks: the notorious difficulty in propagating gradient through a low-precision network due to the non-differentiable quantization function.

Image Classification object-detection +2

Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation

no code implementations CVPR 2019 Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid

In this paper, we propose to train convolutional neural networks (CNNs) with both binarized weights and activations, leading to quantized models specifically} for mobile devices with limited power capacity and computation resources.

General Classification Image Classification +2

Training Compact Neural Networks with Binary Weights and Low Precision Activations

no code implementations8 Aug 2018 Bohan Zhuang, Chunhua Shen, Ian Reid

In this paper, we propose to train a network with binary weights and low-bitwidth activations, designed especially for mobile devices with limited power consumption.

Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

no code implementations CVPR 2018 Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel

To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs.

Object Object Discovery +2

Towards Effective Low-bitwidth Convolutional Neural Networks

2 code implementations CVPR 2018 Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid

This paper tackles the problem of training a deep convolutional neural network with both low-precision weights and low-bitwidth activations.

Quantization

Towards Context-Aware Interaction Recognition for Visual Relationship Detection

1 code implementation ICCV 2017 Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid

The proposed method still builds one classifier for one interaction (as per type (ii) above), but the classifier built is adaptive to context via weights which are context dependent.

Relationship Detection Visual Relationship Detection

TasselNet: Counting maize tassels in the wild via local counts regression network

no code implementations7 Jul 2017 Hao Lu, Zhiguo Cao, Yang Xiao, Bohan Zhuang, Chunhua Shen

To our knowledge, this is the first time that a plant-related counting problem is considered using computer vision technologies under unconstrained field-based environment.

Plant Phenotyping regression

Care about you: towards large-scale human-centric visual relationship detection

no code implementations28 May 2017 Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel

In addressing this problem we first construct a large-scale human-centric visual relationship detection dataset (HCVRD), which provides many more types of relationship annotation (nearly 10K categories) than the previous released datasets.

Human-Object Interaction Detection Relationship Detection +1

Towards Context-aware Interaction Recognition

no code implementations18 Mar 2017 Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid

Recognizing how objects interact with each other is a crucial task in visual recognition.

Visual Tracking via Shallow and Deep Collaborative Model

no code implementations27 Jul 2016 Bohan Zhuang, Lijun Wang, Huchuan Lu

In the discriminative model, we exploit the advances of deep learning architectures to learn generic features which are robust to both background clutters and foreground appearance variations.

Incremental Learning Visual Tracking

Fast Training of Triplet-based Deep Binary Embedding Networks

no code implementations CVPR 2016 Bohan Zhuang, Guosheng Lin, Chunhua Shen, Ian Reid

To solve the first stage, we design a large-scale high-order binary codes inference algorithm to reduce the high-order objective to a standard binary quadratic problem such that graph cuts can be used to efficiently infer the binary code which serve as the label of each training datum.

Image Retrieval Multi-Label Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.