Search Results for author: Bohan Zhuang

Found 59 papers, 31 papers with code

LongVLM: Efficient Long Video Understanding via Large Language Models

1 code implementation • 4 Apr 2024 • Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang

In this way, we encode video representations that incorporate both local and global information, enabling the LLM to generate comprehensive responses for long-term videos.

Question Answering Video Question Answering +1

Paper
Code

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

1 code implementation • 21 Mar 2024 • Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai

We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images.

Ranked #1 on Generalizable Novel View Synthesis on ACID

3D Reconstruction Generalizable Novel View Synthesis +2

356

Paper
Code

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

1 code implementation • 12 Mar 2024 • Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang

Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains challenging.

Paper
Code

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

1 code implementation • 21 Feb 2024 • Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar

Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model.

Image Generation

Paper
Code

ModaVerse: Efficiently Transforming Modalities with LLMs

1 code implementation • 12 Jan 2024 • Xinyu Wang, Bohan Zhuang, Qi Wu

This alignment process, which synchronizes a language model trained on textual data with encoders and decoders trained on multi-modal data, often necessitates extensive training of several projection layers in multiple stages.

Language Modelling Large Language Model

Paper
Code

Efficient Stitchable Task Adaptation

1 code implementation • 29 Nov 2023 • Haoyu He, Zizheng Pan, Jing Liu, Jianfei Cai, Bohan Zhuang

In this work, we present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints.

Chatbot

Paper
Code

Efficient Test-Time Adaptation for Super-Resolution with Second-Order Degradation and Reconstruction

1 code implementation • NeurIPS 2023 • Zeshuai Deng, Zhuokun Chen, Shuaicheng Niu, Thomas H. Li, Bohan Zhuang, Mingkui Tan

Then, we adapt the SR model by implementing feature-level reconstruction learning from the initial test image to its second-order degraded counterparts, which helps the SR model generate plausible HR images.

Image Super-Resolution Test-time Adaptation

Paper
Code

Mask Propagation for Efficient Video Semantic Segmentation

1 code implementation • NeurIPS 2023 • Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang, Bohan Zhuang

By reusing predictions from key frames, we circumvent the need to process a large volume of video frames individually with resource-intensive segmentors, alleviating temporal redundancy and significantly reducing computational costs.

Semantic Segmentation Video Semantic Segmentation

Paper
Code

Object-aware Inversion and Reassembly for Image Editing

no code implementations • 18 Oct 2023 • Zhen Yang, Ganggui Ding, Wen Wang, Hao Chen, Bohan Zhuang, Chunhua Shen

Subsequently, we propose an additional reassembly step to seamlessly integrate the respective editing results and the non-editing region to obtain the final edited image.

Benchmarking Denoising +1

Paper
Add Code

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

2 code implementations • 12 Oct 2023 • Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang

Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly.

Quantization

Paper
Code

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

1 code implementation • 5 Oct 2023 • Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

In this paper, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.

Denoising Image Generation +1

Paper
Code

SwitchGPT: Adapting Large Language Models for Non-Text Outputs

no code implementations • 14 Sep 2023 • Xinyu Wang, Bohan Zhuang, Qi Wu

To bridge this gap, we propose a novel approach, \methodname, from a modality conversion perspective that evolves a text-based LLM into a multi-modal one.

Paper
Add Code

Stitched ViTs are Flexible Vision Backbones

1 code implementation • 30 Jun 2023 • Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang

With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K and NYUv2, SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone, achieving great advantages in both training efficiency and deployment flexibility.

Paper
Code

LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

no code implementations • 28 May 2023 • Mingyang Zhang, Hao Chen, Chunhua Shen, Zhen Yang, Linlin Ou, Xinyi Yu, Bohan Zhuang

This is due to their utilization of unstructured pruning on LPMs, impeding the merging of LoRA weights, or their dependence on the gradients of pre-trained weights to guide pruning, which can impose significant memory overhead.

Model Compression Network Pruning

Paper
Add Code

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

1 code implementation • ICCV 2023 • Haoyu He, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang

Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty.

Paper
Code

Stitchable Neural Networks

2 code implementations • CVPR 2023 • Zizheng Pan, Jianfei Cai, Bohan Zhuang

As each model family consists of pretrained models with diverse scales (e. g., DeiT-Ti/S/B), it naturally arises a fundamental question of how to efficiently assemble these readily available models in a family for dynamic accuracy-efficiency trade-offs at runtime.

Image Classification

254

Paper
Code

A Survey on Efficient Training of Transformers

no code implementations • 2 Feb 2023 • Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen

Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources.

Paper
Add Code

BiViT: Extremely Compressed Binary Vision Transformers

no code implementations • ICCV 2023 • Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

Paper
Add Code

BiViT: Extremely Compressed Binary Vision Transformer

no code implementations • 14 Nov 2022 • Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

Paper
Add Code

EcoFormer: Energy-Saving Attention with Linear Complexity

1 code implementation • 19 Sep 2022 • Jing Liu, Zizheng Pan, Haoyu He, Jianfei Cai, Bohan Zhuang

To this end, we propose a new binarization paradigm customized to high-dimensional softmax attention via kernelized hashing, called EcoFormer, to map the original queries and keys into low-dimensional binary codes in Hamming space.

Binarization

Paper
Code

FocusFormer: Focusing on What We Need via Architecture Sampler

no code implementations • 23 Aug 2022 • Jing Liu, Jianfei Cai, Bohan Zhuang

During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment.

Neural Architecture Search

Paper
Add Code

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

no code implementations • 21 Jul 2022 • Yuetian Weng, Zizheng Pan, Mingfei Han, Xiaojun Chang, Bohan Zhuang

The task of action detection aims at deducing both the action category and localization of the start and end moment for each action instance in a long, untrimmed video.

Action Detection Video Understanding

Paper
Add Code

Fast Vision Transformers with HiLo Attention

5 code implementations • 26 May 2022 • Zizheng Pan, Jianfei Cai, Bohan Zhuang

Therefore, we propose to disentangle the high/low frequency patterns in an attention layer by separating the heads into two groups, where one group encodes high frequencies via self-attention within each local window, and another group encodes low frequencies by performing global attention between the average-pooled low-frequency keys and values from each window and each query position in the input feature map.

Ranked #281 on Image Classification on ImageNet

Benchmarking Efficient ViTs +2

570

Paper
Code

Dynamic Focus-aware Positional Queries for Semantic Segmentation

2 code implementations • CVPR 2023 • Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang

In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.

Ranked #21 on Semantic Segmentation on ADE20K

Semantic Segmentation

Paper
Code

Automated Progressive Learning for Efficient Training of Vision Transformers

1 code implementation • CVPR 2022 • Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun Chang, Yi Yang

First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth.

Paper
Code

Sharpness-aware Quantization for Deep Neural Networks

3 code implementations • 24 Nov 2021 • Jing Liu, Jianfei Cai, Bohan Zhuang

However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance.

Ranked #196 on Image Classification on CIFAR-100

Image Classification Model Compression +1

Paper
Code

Pruning Self-attentions into Convolutional Layers in Single Path

3 code implementations • 23 Nov 2021 • Haoyu He, Jianfei Cai, Jing Liu, Zizheng Pan, Jing Zhang, DaCheng Tao, Bohan Zhuang

Relying on the single-path space, we introduce learnable binary gates to encode the operation choices in MSA layers.

Ranked #18 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Efficient ViTs Inductive Bias +1

Paper
Code

Mesa: A Memory-saving Training Framework for Transformers

3 code implementations • 22 Nov 2021 • Zizheng Pan, Peng Chen, Haoyu He, Jing Liu, Jianfei Cai, Bohan Zhuang

While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all intermediate activations that are needed for gradient computation during backpropagation, especially for long sequences.

Quantization

119

Paper
Code

Rapid Elastic Architecture Search under Specialized Classes and Resource Constraints

no code implementations • 3 Aug 2021 • Jing Liu, Bohan Zhuang, Mingkui Tan, Xu Liu, Dinh Phung, Yuanqing Li, Jianfei Cai

More critically, EAS is able to find compact architectures within 0. 1 second for 50 deployment scenarios.

Image Classification

Paper
Add Code

Less is More: Pay Less Attention in Vision Transformers

2 code implementations • 29 May 2021 • Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu, Jianfei Cai

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision.

Image Classification Instance Segmentation +3

Paper
Code

End-to-end One-shot Human Parsing

1 code implementation • 4 May 2021 • Haoyu He, Bohan Zhuang, Jing Zhang, Jianfei Cai, DaCheng Tao

To address three main challenges in OSHP, i. e., small sizes, testing bias, and similar parts, we devise an End-to-end One-shot human Parsing Network (EOP-Net).

Human Parsing Metric Learning +1

Paper
Code

Scalable Vision Transformers with Hierarchical Pooling

2 code implementations • ICCV 2021 • Zizheng Pan, Bohan Zhuang, Jing Liu, Haoyu He, Jianfei Cai

However, the routine of the current ViT model is to maintain a full-length patch sequence during inference, which is redundant and lacks hierarchical representation.

Ranked #22 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Efficient ViTs

1,185

Paper
Code

Single-path Bit Sharing for Automatic Loss-aware Model Compression

no code implementations • 13 Jan 2021 • Jing Liu, Bohan Zhuang, Peng Chen, Chunhua Shen, Jianfei Cai, Mingkui Tan

By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined.

Model Compression Network Pruning +1

Paper
Add Code

Fully Quantized Image Super-Resolution Networks

1 code implementation • 29 Nov 2020 • Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen

With the rising popularity of intelligent mobile devices, it is of great practical significance to develop accurate, realtime and energy-efficient image Super-Resolution (SR) inference methods.

Image Super-Resolution Quantization

Paper
Code

FATNN: Fast and Accurate Ternary Neural Networks

no code implementations • ICCV 2021 • Peng Chen, Bohan Zhuang, Chunhua Shen

Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.

Image Classification Quantization

Paper
Add Code

AQD: Towards Accurate Fully-Quantized Object Detection

1 code implementation • CVPR 2021 • Peng Chen, Jing Liu, Bohan Zhuang, Mingkui Tan, Chunhua Shen

Network quantization allows inference to be conducted using low-precision arithmetic for improved inference efficiency of deep neural networks on edge devices.

Image Classification Object +3

Paper
Code

Role-Wise Data Augmentation for Knowledge Distillation

1 code implementation • ICLR 2020 • Jie Fu, Xue Geng, Zhijian Duan, Bohan Zhuang, Xingdi Yuan, Adam Trischler, Jie Lin, Chris Pal, Hao Dong

To our knowledge, existing methods overlook the fact that although the student absorbs extra knowledge from the teacher, both models share the same input data -- and this data is the only medium by which the teacher's knowledge can be demonstrated.

Data Augmentation Knowledge Distillation

Paper
Code

Generative Low-bitwidth Data Free Quantization

3 code implementations • ECCV 2020 • Shoukai Xu, Haokun Li, Bohan Zhuang, Jing Liu, JieZhang Cao, Chuangrun Liang, Mingkui Tan

More critically, our method achieves much higher accuracy on 4-bit quantization than the existing data free quantization method.

Ranked #2 on Data Free Quantization on CIFAR-100

Data Free Quantization

Paper
Code

Switchable Precision Neural Networks

no code implementations • 7 Feb 2020 • Luis Guerra, Bohan Zhuang, Ian Reid, Tom Drummond

Instantaneous and on demand accuracy-efficiency trade-off has been recently explored in the context of neural networks slimming.

Quantization

Paper
Add Code

Automatic Pruning for Quantized Neural Networks

no code implementations • 3 Feb 2020 • Luis Guerra, Bohan Zhuang, Ian Reid, Tom Drummond

In particular, for ResNet-18 on ImageNet, we prune 26. 12% of the model size with Binarized Neural Network quantization, achieving a top-1 classification accuracy of 47. 32% in a model of 2. 47 MB and 59. 30% with a 2-bit DoReFa-Net in 4. 36 MB.

Bayesian Optimization Quantization

Paper
Add Code

Discrimination-aware Network Pruning for Deep Model Compression

1 code implementation • 4 Jan 2020 • Jing Liu, Bohan Zhuang, Zhuangwei Zhuang, Yong Guo, Junzhou Huang, Jinhui Zhu, Mingkui Tan

In this paper, we propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power.

Face Recognition Image Classification +2

183

Paper
Code

Structured Binary Neural Networks for Image Recognition

no code implementations • 22 Sep 2019 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Peng Chen, Lingqiao Liu, Ian Reid

Experiments on both classification, semantic segmentation and object detection tasks demonstrate the superior performance of the proposed methods over various quantized networks in the literature.

object-detection Object Detection +2

Paper
Add Code

Auxiliary Learning for Deep Multi-task Learning

no code implementations • 5 Sep 2019 • Yifan Liu, Bohan Zhuang, Chunhua Shen, Hao Chen, Wei Yin

The most current methods can be categorized as either: (i) hard parameter sharing where a subset of the parameters is shared among tasks while other parameters are task-specific; or (ii) soft parameter sharing where all parameters are task-specific but they are jointly regularized.

Auxiliary Learning Depth Estimation +3

Paper
Add Code

RBCN: Rectified Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs

no code implementations • 21 Aug 2019 • Chunlei Liu, Wenrui Ding, Xin Xia, Yuan Hu, Baochang Zhang, Jianzhuang Liu, Bohan Zhuang, Guodong Guo

Binarized convolutional neural networks (BCNNs) are widely used to improve memory and computation efficiency of deep convolutional neural networks (DCNNs) for mobile and AI chips based applications.

Binarization Object Tracking

Paper
Add Code

Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations

no code implementations • 10 Aug 2019 • Bohan Zhuang, Jing Liu, Mingkui Tan, Lingqiao Liu, Ian Reid, Chunhua Shen

Furthermore, we propose a second progressive quantization scheme which gradually decreases the bit-width from high-precision to low-precision during training.

Knowledge Distillation Quantization

Paper
Add Code

Training Quantized Neural Networks with a Full-precision Auxiliary Module

no code implementations • CVPR 2020 • Bohan Zhuang, Lingqiao Liu, Mingkui Tan, Chunhua Shen, Ian Reid

In this paper, we seek to tackle a challenge in training low-precision networks: the notorious difficulty in propagating gradient through a low-precision network due to the non-differentiable quantization function.

Image Classification object-detection +2

Paper
Add Code

Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation

no code implementations • CVPR 2019 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid

In this paper, we propose to train convolutional neural networks (CNNs) with both binarized weights and activations, leading to quantized models specifically} for mobile devices with limited power capacity and computation resources.

General Classification Image Classification +2

Paper
Add Code

Discrimination-aware Channel Pruning for Deep Neural Networks

1 code implementation • NeurIPS 2018 • Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu

Channel pruning is one of the predominant approaches for deep model compression.

Model Compression

183

Paper
Code

Training Compact Neural Networks with Binary Weights and Low Precision Activations

no code implementations • 8 Aug 2018 • Bohan Zhuang, Chunhua Shen, Ian Reid

In this paper, we propose to train a network with binary weights and low-bitwidth activations, designed especially for mobile devices with limited power consumption.

Paper
Add Code

Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

no code implementations • CVPR 2018 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel

To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs.

Object Object Discovery +2

Paper
Add Code

Towards Effective Low-bitwidth Convolutional Neural Networks

2 code implementations • CVPR 2018 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid

This paper tackles the problem of training a deep convolutional neural network with both low-precision weights and low-bitwidth activations.

Quantization

Paper
Code

Towards Context-Aware Interaction Recognition for Visual Relationship Detection

1 code implementation • ICCV 2017 • Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid

The proposed method still builds one classifier for one interaction (as per type (ii) above), but the classifier built is adaptive to context via weights which are context dependent.

Relationship Detection Visual Relationship Detection

Paper
Code

TasselNet: Counting maize tassels in the wild via local counts regression network

no code implementations • 7 Jul 2017 • Hao Lu, Zhiguo Cao, Yang Xiao, Bohan Zhuang, Chunhua Shen

To our knowledge, this is the first time that a plant-related counting problem is considered using computer vision technologies under unconstrained field-based environment.

Plant Phenotyping regression

Paper
Add Code

Care about you: towards large-scale human-centric visual relationship detection

no code implementations • 28 May 2017 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel

In addressing this problem we first construct a large-scale human-centric visual relationship detection dataset (HCVRD), which provides many more types of relationship annotation (nearly 10K categories) than the previous released datasets.

Human-Object Interaction Detection Relationship Detection +1

Paper
Add Code

Towards Context-aware Interaction Recognition

no code implementations • 18 Mar 2017 • Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid

Recognizing how objects interact with each other is a crucial task in visual recognition.

Paper
Add Code

Sequential Person Recognition in Photo Albums with a Recurrent Network

no code implementations • CVPR 2017 • Yao Li, Guosheng Lin, Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel

In this work, we propose to model the relational information between people as a sequence prediction task.

Person Recognition

Paper
Add Code

Attend in groups: a weakly-supervised deep learning framework for learning from web data

no code implementations • CVPR 2017 • Bohan Zhuang, Lingqiao Liu, Yao Li, Chunhua Shen, Ian Reid

Large-scale datasets have driven the rapid development of deep neural networks for visual recognition.

Paper
Add Code

Visual Tracking via Shallow and Deep Collaborative Model

no code implementations • 27 Jul 2016 • Bohan Zhuang, Lijun Wang, Huchuan Lu

In the discriminative model, we exploit the advances of deep learning architectures to learn generic features which are robust to both background clutters and foreground appearance variations.

Incremental Learning Visual Tracking

Paper
Add Code

Fast Training of Triplet-based Deep Binary Embedding Networks

no code implementations • CVPR 2016 • Bohan Zhuang, Guosheng Lin, Chunhua Shen, Ian Reid

To solve the first stage, we design a large-scale high-order binary codes inference algorithm to reduce the high-order objective to a standard binary quadratic problem such that graph cuts can be used to efficiently infer the binary code which serve as the label of each training datum.

Image Retrieval Multi-Label Classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.