no code implementations • 12 Apr 2024 • Je-Yong Lee, DongHyun Lee, Genghan Zhang, Mo Tiwari, Azalia Mirhoseini
We demonstrate that CATS can be applied to various base models, including Mistral-7B and Llama2-7B, and outperforms existing sparsification techniques in downstream task performance.
no code implementations • 15 Jan 2024 • DongHyun Lee, Ruokai Yin, Youngeun Kim, Abhishek Moitra, Yuhang Li, Priyadarshini Panda
Spiking Neural Networks (SNNs) have gained significant attention as a potentially energy-efficient alternative for standard neural networks with their sparse binary activation.
no code implementations • 7 Dec 2023 • Yuhang Li, Youngeun Kim, DongHyun Lee, Souvik Kundu, Priyadarshini Panda
In the realm of deep neural network deployment, low-bit quantization presents a promising avenue for enhancing computational efficiency.
no code implementations • 14 Dec 2022 • Mo Tiwari, Ryan Kang, Je-Yong Lee, DongHyun Lee, Chris Piech, Sebastian Thrun, Ilan Shomorony, Martin Jinye Zhang
We provide theoretical guarantees that BanditMIPS returns the correct answer with high probability, while improving the complexity in $d$ from $O(\sqrt{d})$ to $O(1)$.
no code implementations • 18 Mar 2021 • DongHyun Lee, Minkyoung Cho, Seungwon Lee, Joonho Song, Changkyu Choi
Post-training quantization is a representative technique for compressing neural networks, making them smaller and more efficient for deployment on edge devices.