no code implementations • 8 Feb 2024 • Seyedarmin Azizi, Mahdi Nazemi, Massoud Pedram
This paper addresses this memory limitation by introducing an activation-aware model compression methodology that uses selective low-rank weight tensor approximations of different layers to reduce the parameter count of ViTs.
no code implementations • 3 Dec 2023 • Seyedarmin Azizi, Mahdi Nazemi, Mehdi Kamal, Massoud Pedram
This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems.
no code implementations • 12 Aug 2023 • Seyedarmin Azizi, Mahdi Nazemi, Arash Fayyazi, Massoud Pedram
As a result, our proposed method represents a leap forward in neural network design optimization, paving the way for quick model design and implementation in settings with limited resources, thereby propelling the potential of scalable deep learning solutions.
no code implementations • 8 May 2023 • Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram
Post-training compression techniques such as pruning and quantization can help lower deployment costs.