Search Results for author: Seyedarmin Azizi

Found 4 papers, 0 papers with code

Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy

no code implementations8 Feb 2024 Seyedarmin Azizi, Mahdi Nazemi, Massoud Pedram

This paper addresses this memory limitation by introducing an activation-aware model compression methodology that uses selective low-rank weight tensor approximations of different layers to reduce the parameter count of ViTs.

Model Compression

Low-Precision Mixed-Computation Models for Inference on Edge

no code implementations3 Dec 2023 Seyedarmin Azizi, Mahdi Nazemi, Mehdi Kamal, Massoud Pedram

This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems.

Quantization

Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation

no code implementations12 Aug 2023 Seyedarmin Azizi, Mahdi Nazemi, Arash Fayyazi, Massoud Pedram

As a result, our proposed method represents a leap forward in neural network design optimization, paving the way for quick model design and implementation in settings with limited resources, thereby propelling the potential of scalable deep learning solutions.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.