no code implementations • 9 Apr 2022 • Berkin Akin, Suyog Gupta, Yun Long, Anton Spiridonov, Zhuo Wang, Marie White, Hao Xu, Ping Zhou, Yanqi Zhou
On-device ML accelerators are becoming a standard in modern mobile system-on-chips (SoC).
no code implementations • 18 Aug 2020 • Grace Chu, Okan Arikan, Gabriel Bender, Weijun Wang, Achille Brighton, Pieter-Jan Kindermans, Hanxiao Liu, Berkin Akin, Suyog Gupta, Andrew Howard
Hardware-aware neural architecture designs have been predominantly focusing on optimizing model performance on single hardware and model development complexity, where another important factor, model deployment complexity, has been largely ignored.
4 code implementations • CVPR 2021 • Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin Akin, Gabriel Bender, Yongzhe Wang, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen
By incorporating regular convolutions in the search space and directly optimizing the network architectures for object detection, we obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.
no code implementations • 5 Mar 2020 • Suyog Gupta, Berkin Akin
While neural network hardware accelerators provide a substantial amount of raw compute throughput, the models deployed on them must be co-designed for the underlying hardware architecture to obtain the optimal system performance.
Hardware Aware Neural Architecture Search Image Classification +1
2 code implementations • 21 Feb 2019 • Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.
4 code implementations • ICLR 2018 • Michael Zhu, Suyog Gupta
Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model.
1 code implementation • 18 Nov 2015 • Wei Zhang, Suyog Gupta, Xiangru Lian, Ji Liu
Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks.
1 code implementation • 14 Sep 2015 • Suyog Gupta, Wei zhang, Fei Wang
This paper presents Rudra, a parameter server based distributed computing framework tuned for training large-scale deep neural networks.
2 code implementations • 9 Feb 2015 • Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, Pritish Narayanan
Training of large-scale deep neural networks is often constrained by the available computational resources.
no code implementations • 9 Sep 2014 • Suyog Gupta, Vikas Sindhwani, Kailash Gopalakrishnan
This paper highlights new opportunities for designing large-scale machine learning systems as a consequence of blurring traditional boundaries that have allowed algorithm designers and application-level practitioners to stay -- for the most part -- oblivious to the details of the underlying hardware-level implementations.