Search Results for author: Yujun Lin

Found 15 papers, 9 papers with code

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

no code implementations • 25 Apr 2022 • Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition.

Model Compression Neural Architecture Search +3

Paper
Add Code

TorchSparse: Efficient Point Cloud Inference Engine

1 code implementation • 21 Apr 2022 • Haotian Tang, Zhijian Liu, Xiuyu Li, Yujun Lin, Song Han

TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement.

Autonomous Driving

1,113

Paper
Code

Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning

no code implementations • NeurIPS 2021 • Ligeng Zhu, Hongzhou Lin, Yao Lu, Yujun Lin, Song Han

Federated Learning is an emerging direction in distributed machine learning that en-ables jointly training a model without sharing the data.

Federated Learning

Paper
Add Code

QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits

2 code implementations • 22 Jul 2021 • Hanrui Wang, Yongshan Ding, Jiaqi Gu, Zirui Li, Yujun Lin, David Z. Pan, Frederic T. Chong, Song Han

Extensively evaluated with 12 QML and VQE benchmarks on 14 quantum computers, QuantumNAS significantly outperforms baselines.

1,190

Paper
Code

NAAS: Neural Accelerator Architecture Search

no code implementations • 27 May 2021 • Yujun Lin, Mengtian Yang, Song Han

Data-driven, automatic design space exploration of neural accelerator architecture is desirable for specialization and productivity.

Paper
Add Code

Hardware-Centric AutoML for Mixed-Precision Quantization

no code implementations • 11 Aug 2020 • Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han

Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.

AutoML Quantization

Paper
Add Code

Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

6 code implementations • ECCV 2020 • Haotian Tang, Zhijian Liu, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, Song Han

Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely.

Ranked #1 on Robust 3D Semantic Segmentation on SemanticKITTI-C

3D Object Detection LIDAR Semantic Segmentation +4

1,132

Paper
Code

MCUNet: Tiny Deep Learning on IoT Devices

1 code implementation • NeurIPS 2020 • Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, Song Han

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones.

BIG-bench Machine Learning Neural Architecture Search +1

401

Paper
Code

Lite Transformer with Long-Short Range Attention

2 code implementations • ICLR 2020 • Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin, Song Han

For language modeling, Lite Transformer achieves 1. 8 lower perplexity than the transformer at around 500M MACs.

Ranked #36 on Machine Translation on WMT2014 English-French

Abstractive Text Summarization AutoML +5

591

Paper
Code

Distributed Training Across the World

no code implementations • 25 Sep 2019 • Ligeng Zhu, Yao Lu, Yujun Lin, Song Han

Traditional synchronous distributed training is performed inside a cluster, since it requires high bandwidth and low latency network (e. g. 25Gb Ethernet or Infini-band).

Paper
Add Code

Point-Voxel CNN for Efficient 3D Deep Learning

4 code implementations • NeurIPS 2019 • Zhijian Liu, Haotian Tang, Yujun Lin, Song Han

The computation cost and memory footprints of the voxel-based models grow cubically with the input resolution, making it memory-prohibitive to scale up the resolution.

Ranked #1 on 3D Object Detection on KITTI Pedestrian Hard val

3D Object Detection 3D Semantic Segmentation +2

1,672

Paper
Code

Design Automation for Efficient Deep Learning Computing

no code implementations • 24 Apr 2019 • Song Han, Han Cai, Ligeng Zhu, Ji Lin, Kuan Wang, Zhijian Liu, Yujun Lin

Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms.

Quantization

Paper
Add Code

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

11 code implementations • CVPR 2019 • Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han

Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.

Quantization

1,839

Paper
Code

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

3 code implementations • ICLR 2018 • Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally

The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections.

Federated Learning Image Classification +3

244

Paper
Code

Deep Gradient Compression Reduce the Communication Bandwidth For distributed Traning

1 code implementation • The International Conference on Learning Representations 2017 • Yujun Lin, Song Han, Huizi Mao, Yu Wang, W. Dally

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure.

Federated Learning Image Classification +3

203

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.