2 code implementations • 12 Dec 2023 • Ji Lin, Hongxu Yin, Wei Ping, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song Han
Visual language models (VLMs) rapidly progressed with the recent success of large language models.
Ranked #24 on Visual Question Answering on MM-Vet
1 code implementation • 26 May 2022 • Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, Song Han
Multi-sensor fusion is essential for an accurate and reliable autonomous driving system.
Ranked #4 on 3D Object Detection on nuScenes
1 code implementation • 10 Mar 2021 • Huizi Mao, Sibo Zhu, Song Han, William J. Dally
Object recognition is a fundamental problem in many video processing tasks, accurately locating seen objects at low computation cost paves the way for on-device video recognition.
1 code implementation • ICCV 2019 • Huizi Mao, Xiaodong Yang, William J. Dally
Average precision (AP) is a widely used metric to evaluate detection accuracy of image and video object detectors.
no code implementations • 30 Sep 2018 • Huizi Mao, Taeyoung Kong, William J. Dally
Experiments on the KITTI dataset show that CaTDet reduces operation count by 5. 1-8. 7x with the same mean Average Precision(mAP) as the single-model Faster R-CNN detector and incurs additional delay of 0. 3 frame.
3 code implementations • ICLR 2018 • Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally
The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections.
1 code implementation • The International Conference on Learning Representations 2017 • Yujun Lin, Song Han, Huizi Mao, Yu Wang, W. Dally
Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure.
no code implementations • 24 May 2017 • Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally
Since memory reference is more than two orders of magnitude more expensive than arithmetic operations, the regularity of sparse structure leads to more efficient hardware design.
4 code implementations • 4 Dec 2016 • Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally
To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values.
no code implementations • 1 Dec 2016 • Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally
Evaluated on the LSTM for speech recognition benchmark, ESE is 43x and 3x faster than Core i7 5930k CPU and Pascal Titan X GPU implementations.
2 code implementations • 15 Jul 2016 • Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally
We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance.
4 code implementations • 4 Feb 2016 • Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally
EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1. 88x10^4 frames/sec with a power dissipation of only 600mW.
15 code implementations • 1 Oct 2015 • Song Han, Huizi Mao, William J. Dally
To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.