Search Results for author: Yujun Lin

Found 15 papers, 9 papers with code

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

no code implementations25 Apr 2022 Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition.

Model Compression Neural Architecture Search +3

TorchSparse: Efficient Point Cloud Inference Engine

1 code implementation21 Apr 2022 Haotian Tang, Zhijian Liu, Xiuyu Li, Yujun Lin, Song Han

TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement.

Autonomous Driving

Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning

no code implementations NeurIPS 2021 Ligeng Zhu, Hongzhou Lin, Yao Lu, Yujun Lin, Song Han

Federated Learning is an emerging direction in distributed machine learning that en-ables jointly training a model without sharing the data.

Federated Learning

QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits

2 code implementations22 Jul 2021 Hanrui Wang, Yongshan Ding, Jiaqi Gu, Zirui Li, Yujun Lin, David Z. Pan, Frederic T. Chong, Song Han

Extensively evaluated with 12 QML and VQE benchmarks on 14 quantum computers, QuantumNAS significantly outperforms baselines.

NAAS: Neural Accelerator Architecture Search

no code implementations27 May 2021 Yujun Lin, Mengtian Yang, Song Han

Data-driven, automatic design space exploration of neural accelerator architecture is desirable for specialization and productivity.

Hardware-Centric AutoML for Mixed-Precision Quantization

no code implementations11 Aug 2020 Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han

Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.

AutoML Quantization

MCUNet: Tiny Deep Learning on IoT Devices

1 code implementation NeurIPS 2020 Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, Song Han

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones.

BIG-bench Machine Learning Neural Architecture Search +1

Distributed Training Across the World

no code implementations25 Sep 2019 Ligeng Zhu, Yao Lu, Yujun Lin, Song Han

Traditional synchronous distributed training is performed inside a cluster, since it requires high bandwidth and low latency network (e. g. 25Gb Ethernet or Infini-band).

Point-Voxel CNN for Efficient 3D Deep Learning

4 code implementations NeurIPS 2019 Zhijian Liu, Haotian Tang, Yujun Lin, Song Han

The computation cost and memory footprints of the voxel-based models grow cubically with the input resolution, making it memory-prohibitive to scale up the resolution.

3D Object Detection 3D Semantic Segmentation +2

Design Automation for Efficient Deep Learning Computing

no code implementations24 Apr 2019 Song Han, Han Cai, Ligeng Zhu, Ji Lin, Kuan Wang, Zhijian Liu, Yujun Lin

Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms.

Quantization

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

11 code implementations CVPR 2019 Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han

Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.

Quantization

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

3 code implementations ICLR 2018 Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally

The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections.

Federated Learning Image Classification +3

Deep Gradient Compression Reduce the Communication Bandwidth For distributed Traning

1 code implementation The International Conference on Learning Representations 2017 Yujun Lin, Song Han, Huizi Mao, Yu Wang, W. Dally

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure.

Federated Learning Image Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.