Search Results for author: Seyedarmin Azizi

Found 4 papers, 0 papers with code

Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy

no code implementations • 8 Feb 2024 • Seyedarmin Azizi, Mahdi Nazemi, Massoud Pedram

This paper addresses this memory limitation by introducing an activation-aware model compression methodology that uses selective low-rank weight tensor approximations of different layers to reduce the parameter count of ViTs.

Model Compression

Paper
Add Code

Low-Precision Mixed-Computation Models for Inference on Edge

no code implementations • 3 Dec 2023 • Seyedarmin Azizi, Mahdi Nazemi, Mehdi Kamal, Massoud Pedram

This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems.

Quantization

Paper
Add Code

Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation

no code implementations • 12 Aug 2023 • Seyedarmin Azizi, Mahdi Nazemi, Arash Fayyazi, Massoud Pedram

As a result, our proposed method represents a leap forward in neural network design optimization, paving the way for quick model design and implementation in settings with limited resources, thereby propelling the potential of scalable deep learning solutions.

Quantization

Paper
Add Code

CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation

no code implementations • 8 May 2023 • Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram

Post-training compression techniques such as pruning and quantization can help lower deployment costs.

Model Compression Quantization +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.