1 code implementation • 17 Apr 2024 • Luca Bompani, Manuele Rusci, Daniele Palossi, Francesco Conti, Luca Benini
This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors.
1 code implementation • 3 Apr 2024 • Victor J. B. Jung, Alessio Burrello, Moritz Scherer, Francesco Conti, Luca Benini
Moreover, we show that our MHSA depth-first tiling scheme reduces the memory peak by up to 6. 19x, while the fused-weight attention can reduce the runtime by 1. 53x, and number of parameters by 25%.
no code implementations • 29 Nov 2023 • Fabrizio Ferrandi, Serena Curzel, Leandro Fiorin, Daniele Ielmini, Cristina Silvano, Francesco Conti, Alessio Burrello, Francesco Barchi, Luca Benini, Luciano Lavagno, Teodoro Urso, Enrico Calore, Sebastiano Fabio Schifano, Cristian Zambelli, Maurizio Palesi, Giuseppe Ascia, Enrico Russo, Nicola Petra, Davide De Caro, Gennaro Di Meo, Valeria Cardellini, Salvatore Filippone, Francesco Lo Presti, Francesco Silvestri, Paolo Palazzari, Stefania Perri
This survey provides a holistic review of the most influential design methodologies and EDA tools proposed in recent years to implement Deep Learning accelerators, offering the reader a wide perspective in this rapidly evolving field.
no code implementations • 26 Sep 2023 • Francesco Conti, Davide Moroni, Maria Antonietta Pascali
In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered.
no code implementations • 7 Sep 2023 • Francesco Conti, Martina Banchelli, Valentina Bessi, Cristina Cecchi, Fabrizio Chiti, Sara Colantonio, Cristiano D'Andrea, Marella de Angelis, Davide Moroni, Benedetta Nacmias, Maria Antonietta Pascali, Sandro Sorbi, Paolo Matteini
The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer's disease (AD) as well as of 5 pathological controls have been collected and analysed by Raman spectroscopy (RS).
no code implementations • 6 Jul 2023 • Georg Rutishauser, Francesco Conti, Luca Benini
Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization.
no code implementations • 27 Jun 2023 • Cristina Silvano, Daniele Ielmini, Fabrizio Ferrandi, Leandro Fiorin, Serena Curzel, Luca Benini, Francesco Conti, Angelo Garofalo, Cristian Zambelli, Enrico Calore, Sebastiano Fabio Schifano, Maurizio Palesi, Giuseppe Ascia, Davide Patti, Stefania Perri, Nicola Petra, Davide De Caro, Luciano Lavagno, Teodoro Urso, Valeria Cardellini, Gian Carlo Cardarilli, Robert Birke
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable solution for several classes of high-performance computing (HPC) applications such as image classification, computer vision, and speech recognition.
1 code implementation • 30 May 2023 • Davide Nadalini, Manuele Rusci, Luca Benini, Francesco Conti
Enabling On-Device Learning (ODL) for Ultra-Low-Power Micro-Controller Units (MCUs) is a key step for post-deployment adaptation and fine-tuning of Deep Neural Network (DNN) models in future TinyML applications.
1 code implementation • 15 May 2023 • Francesco Conti, Gianna Paulin, Angelo Garofalo, Davide Rossi, Alfio Di Mauro, Georg Rutishauser, Gianmarco Ottavi, Manuel Eggimann, Hayate Okuhara, Luca Benini
We present Marsellus, an all-digital heterogeneous SoC for AI-IoT end-nodes fabricated in GlobalFoundries 22nm FDX that combines 1) a general-purpose cluster of 16 RISC-V Digital Signal Processing (DSP) cores attuned for the execution of a diverse range of workloads exploiting 4-bit and 2-bit arithmetic extensions (XpulpNN), combined with fused MAC&LOAD operations and floating-point support; 2) a 2-8bit Reconfigurable Binary Engine (RBE) to accelerate 3x3 and 1x1 (pointwise) convolutions in DNNs; 3) a set of On-Chip Monitoring (OCM) blocks connected to an Adaptive Body Biasing (ABB) generator and a hardware control loop, enabling on-the-fly adaptation of transistor threshold voltages.
no code implementations • 15 Mar 2023 • Michael Rogenmoser, Yvan Tortorella, Davide Rossi, Francesco Conti, Luca Benini
To mitigate the overheads of traditional radiation hardening and modular redundancy approaches, we present a novel Hybrid Modular Redundancy (HMR) approach, a redundancy scheme that features a cluster of RISC-V processors with a flexible on-demand dual-core and triple-core lockstep grouping of computing cores with runtime split-lock capabilities.
1 code implementation • 24 Jan 2023 • Matteo Risso, Alessio Burrello, Francesco Conti, Lorenzo Lamberti, Yukai Chen, Luca Benini, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari
Neural Architecture Search (NAS) is quickly becoming the go-to approach to optimize the structure of Deep Learning (DL) models for complex tasks such as Image Classification or Object Detection.
1 code implementation • 10 Jan 2023 • Yvan Tortorella, Luca Bertaccini, Luca Benini, Davide Rossi, Francesco Conti
The increasing interest in TinyML, i. e., near-sensor machine learning on power budgets of a few tens of mW, is currently pushing toward enabling TinyML-class training as opposed to inference only.
1 code implementation • 28 Mar 2022 • Matteo Risso, Alessio Burrello, Daniele Jahier Pagliari, Francesco Conti, Lorenzo Lamberti, Enrico Macii, Luca Benini, Massimo Poncino
Temporal Convolutional Networks (TCNs) are promising Deep Learning models for time-series processing tasks.
no code implementations • 24 Mar 2022 • Alessio Burrello, Alberto Dequino, Daniele Jahier Pagliari, Francesco Conti, Marcello Zanghieri, Enrico Macii, Luca Benini, Massimo Poncino
Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Learning models for Time Series analysis.
no code implementations • 14 Feb 2022 • Gianna Paulin, Francesco Conti, Lukas Cavigelli, Luca Benini
For quantifying the overall system power, including I/O power, we built Vau da Muntanialas, to the best of our knowledge, the first demonstration of a systolic multi-chip-on-PCB array of RNN accelerator.
1 code implementation • 20 Jan 2022 • Nazareno Bruschi, Germain Haugou, Giuseppe Tagliavini, Francesco Conti, Luca Benini, Davide Rossi
The last few years have seen the emergence of IoT processors: ultra-low power systems-on-chips (SoCs) combining lightweight and flexible micro-controller units (MCUs), often based on open-ISA RISC-V cores, with application-specific accelerators to maximize performance and energy efficiency.
no code implementations • 4 Jan 2022 • Angelo Garofalo, Gianmarco Ottavi, Francesco Conti, Geethan Karunaratne, Irem Boybat, Luca Benini, Davide Rossi
Furthermore, we explore the requirements for end-to-end inference of a full mobile-grade DNN (MobileNetV2) in terms of IMC array resources, by scaling up our heterogeneous architecture to a multi-array accelerator.
no code implementations • 20 Oct 2021 • Leonardo Ravaglia, Manuele Rusci, Davide Nadalini, Alessandro Capotondi, Francesco Conti, Luca Benini
In this work, we introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power (PULP) processor.
no code implementations • 18 Oct 2021 • Davide Rossi, Francesco Conti, Manuel Eggimann, Alfio Di Mauro, Giuseppe Tagliavini, Stefan Mach, Marco Guermandi, Antonio Pullini, Igor Loi, Jie Chen, Eric Flamand, Luca Benini
Vega achieves SoA-leading efficiency of 615 GOPS/W on 8-bit INT computation (boosted to 1. 3TOPS/W for 8-bit DNN inference with hardware acceleration).
no code implementations • 1 Feb 2021 • Francesco Conti, Gaetano Scarano, Stefania Colonnese
This paper introduces Multiscale Anisotropic Harmonic Filters (MAHFs) aimed at extracting signal variations over non-Euclidean domains, namely 2D-Manifolds and their discrete representations, such as meshes and 3D Point Clouds as well as graphs.
1 code implementation • 17 Aug 2020 • Alessio Burrello, Angelo Garofalo, Nazareno Bruschi, Giuseppe Tagliavini, Davide Rossi, Francesco Conti
In this work, we propose DORY (Deployment Oriented to memoRY) - an automatic tool to deploy DNNs on low cost MCUs with typically less than 1MB of on-chip SRAM memory.
no code implementations • 17 Jul 2020 • Alfio Di Mauro, Francesco Conti, Pasquale Davide Schiavone, Davide Rossi, Luca Benini
On a prototype in 22nm FDX technology, we demonstrate that both the logic and SRAM voltage can be dropped to 0. 5Vwithout any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving energy efficiency by 2. 2X w. r. t.
2 code implementations • 15 Jul 2020 • Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, Davide Rossi
The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA).
Hardware Architecture Image and Video Processing
2 code implementations • 13 Apr 2020 • Francesco Conti
This technical report aims at defining a formal framework for Deep Neural Network (DNN) layer-wise quantization, focusing in particular on the problems related to the final deployment.
1 code implementation • 29 Aug 2019 • Angelo Garofalo, Manuele Rusci, Francesco Conti, Davide Rossi, Luca Benini
We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cluster of RISC-V processors.
2 code implementations • 10 May 2019 • Daniele Palossi, Francesco Conti, Luca Benini
Nano-size unmanned aerial vehicles (UAVs), with few centimeters of diameter and sub-10 Watts of total power budget, have so far been considered incapable of running sophisticated visual-based autonomous navigation software without external aid from base-stations, ad-hoc local positioning infrastructure, and powerful external computation servers.
no code implementations • 4 Feb 2019 • Arthur Stoutchinin, Francesco Conti, Luca Benini
Embedded inference engines for convolutional networks must be parsimonious in memory bandwidth and buffer sizing to meet power and cost constraints.
1 code implementation • 9 Jul 2018 • Francesco Conti, Pasquale Davide Schiavone, Luca Benini
Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy.
3 code implementations • 4 May 2018 • Daniele Palossi, Antonio Loquercio, Francesco Conti, Eric Flamand, Davide Scaramuzza, Luca Benini
As part of our general methodology we discuss the software mapping techniques that enable the state-of-the-art deep convolutional neural network presented in [1] to be fully executed on-board within a strict 6 fps real-time constraint with no compromise in terms of flight results, while all processing is done with only 64 mW on average.
no code implementations • 4 Dec 2017 • Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, Davide Rossi, Luigi Raffo, Luca Benini
Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition.
no code implementations • 15 Nov 2017 • Francesco Conti, Lukas Cavigelli, Gianna Paulin, Igor Susmelj, Luca Benini
Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition.
4 code implementations • 18 Dec 2016 • Francesco Conti, Robert Schilling, Pasquale Davide Schiavone, Antonio Pullini, Davide Rossi, Frank Kagan Gürkaynak, Michael Muehlberghuber, Michael Gautschi, Igor Loi, Germain Haugou, Stefan Mangard, Luca Benini
Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline.