Search Results for author: Francesco Conti

Found 32 papers, 16 papers with code

Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems

1 code implementation • 17 Apr 2024 • Luca Bompani, Manuele Rusci, Daniele Palossi, Francesco Conti, Luca Benini

This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors.

Object object-detection +1

Paper
Code

Optimizing the Deployment of Tiny Transformers on Low-Power MCUs

1 code implementation • 3 Apr 2024 • Victor J. B. Jung, Alessio Burrello, Moritz Scherer, Francesco Conti, Luca Benini

Moreover, we show that our MHSA depth-first tiling scheme reduces the memory peak by up to 6. 19x, while the fused-weight attention can reduce the runtime by 1. 53x, and number of parameters by 25%.

Hand Gesture Recognition Hand-Gesture Recognition

Paper
Code

A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

no code implementations • 29 Nov 2023 • Fabrizio Ferrandi, Serena Curzel, Leandro Fiorin, Daniele Ielmini, Cristina Silvano, Francesco Conti, Alessio Burrello, Francesco Barchi, Luca Benini, Luciano Lavagno, Teodoro Urso, Enrico Calore, Sebastiano Fabio Schifano, Cristian Zambelli, Maurizio Palesi, Giuseppe Ascia, Enrico Russo, Nicola Petra, Davide De Caro, Gennaro Di Meo, Valeria Cardellini, Salvatore Filippone, Francesco Lo Presti, Francesco Silvestri, Paolo Palazzari, Stefania Perri

This survey provides a holistic review of the most influential design methodologies and EDA tools proposed in recent years to implement Deep Learning accelerators, offering the reader a wide perspective in this rapidly evolving field.

Paper
Add Code

A Topological Machine Learning Pipeline for Classification

no code implementations • 26 Sep 2023 • Francesco Conti, Davide Moroni, Maria Antonietta Pascali

In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered.

Classification

Paper
Add Code

Alzheimer Disease Detection from Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning

no code implementations • 7 Sep 2023 • Francesco Conti, Martina Banchelli, Valentina Bessi, Cristina Cecchi, Fabrizio Chiti, Sara Colantonio, Cristiano D'Andrea, Marella de Angelis, Davide Moroni, Benedetta Nacmias, Maria Antonietta Pascali, Sandro Sorbi, Paolo Matteini

The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer's disease (AD) as well as of 5 pathological controls have been collected and analysed by Raman spectroscopy (RS).

Topological Data Analysis

Paper
Add Code

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

no code implementations • 6 Jul 2023 • Georg Rutishauser, Francesco Conti, Luca Benini

Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization.

Navigate Quantization

Paper
Add Code

A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms

no code implementations • 27 Jun 2023 • Cristina Silvano, Daniele Ielmini, Fabrizio Ferrandi, Leandro Fiorin, Serena Curzel, Luca Benini, Francesco Conti, Angelo Garofalo, Cristian Zambelli, Enrico Calore, Sebastiano Fabio Schifano, Maurizio Palesi, Giuseppe Ascia, Davide Patti, Stefania Perri, Nicola Petra, Davide De Caro, Luciano Lavagno, Teodoro Urso, Valeria Cardellini, Gian Carlo Cardarilli, Robert Birke

Recent trends in deep learning (DL) imposed hardware accelerators as the most viable solution for several classes of high-performance computing (HPC) applications such as image classification, computer vision, and speech recognition.

Image Classification speech-recognition +1

Paper
Add Code

Reduced Precision Floating-Point Optimization for Deep Neural Network On-Device Learning on MicroControllers

1 code implementation • 30 May 2023 • Davide Nadalini, Manuele Rusci, Luca Benini, Francesco Conti

Enabling On-Device Learning (ODL) for Ultra-Low-Power Micro-Controller Units (MCUs) is a key step for post-deployment adaptation and fine-tuning of Deep Neural Network (DNN) models in future TinyML applications.

Continual Learning Image Classification +1

Paper
Code

Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC with 2-to-8b DNN Acceleration and 30%-Boost Adaptive Body Biasing

1 code implementation • 15 May 2023 • Francesco Conti, Gianna Paulin, Angelo Garofalo, Davide Rossi, Alfio Di Mauro, Georg Rutishauser, Gianmarco Ottavi, Manuel Eggimann, Hayate Okuhara, Luca Benini

We present Marsellus, an all-digital heterogeneous SoC for AI-IoT end-nodes fabricated in GlobalFoundries 22nm FDX that combines 1) a general-purpose cluster of 16 RISC-V Digital Signal Processing (DSP) cores attuned for the execution of a diverse range of workloads exploiting 4-bit and 2-bit arithmetic extensions (XpulpNN), combined with fused MAC&LOAD operations and floating-point support; 2) a 2-8bit Reconfigurable Binary Engine (RBE) to accelerate 3x3 and 1x1 (pointwise) convolutions in DNNs; 3) a set of On-Chip Monitoring (OCM) blocks connected to an Adaptive Body Biasing (ABB) generator and a hardware control loop, enabling on-the-fly adaptation of transistor threshold voltages.

416

Paper
Code

Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space

no code implementations • 15 Mar 2023 • Michael Rogenmoser, Yvan Tortorella, Davide Rossi, Francesco Conti, Luca Benini

To mitigate the overheads of traditional radiation hardening and modular redundancy approaches, we present a novel Hybrid Modular Redundancy (HMR) approach, a redundancy scheme that features a cluster of RISC-V processors with a flexible on-demand dual-core and triple-core lockstep grouping of computing cores with runtime split-lock capabilities.

Paper
Add Code

Lightweight Neural Architecture Search for Temporal Convolutional Networks at the Edge

1 code implementation • 24 Jan 2023 • Matteo Risso, Alessio Burrello, Francesco Conti, Lorenzo Lamberti, Yukai Chen, Luca Benini, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari

Neural Architecture Search (NAS) is quickly becoming the go-to approach to optimize the structure of Deep Learning (DL) models for complex tasks such as Image Classification or Object Detection.

Image Classification Neural Architecture Search +4

Paper
Code

RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible and Energy-Efficient On-Chip Linear Algebra and TinyML Training Acceleration

1 code implementation • 10 Jan 2023 • Yvan Tortorella, Luca Bertaccini, Luca Benini, Davide Rossi, Francesco Conti

The increasing interest in TinyML, i. e., near-sensor machine learning on power budgets of a few tens of mW, is currently pushing toward enabling TinyML-class training as opposed to inference only.

Paper
Code

Pruning In Time (PIT): A Lightweight Network Architecture Optimizer for Temporal Convolutional Networks

1 code implementation • 28 Mar 2022 • Matteo Risso, Alessio Burrello, Daniele Jahier Pagliari, Francesco Conti, Lorenzo Lamberti, Enrico Macii, Luca Benini, Massimo Poncino

Temporal Convolutional Networks (TCNs) are promising Deep Learning models for time-series processing tasks.

Time Series Time Series Analysis

Paper
Code

TCN Mapping Optimization for Ultra-Low Power Time-Series Edge Inference

no code implementations • 24 Mar 2022 • Alessio Burrello, Alberto Dequino, Daniele Jahier Pagliari, Francesco Conti, Marcello Zanghieri, Enrico Macii, Luca Benini, Massimo Poncino

Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Learning models for Time Series analysis.

Time Series Time Series Analysis

Paper
Add Code

Vau da muntanialas: Energy-efficient multi-die scalable acceleration of RNN inference

no code implementations • 14 Feb 2022 • Gianna Paulin, Francesco Conti, Lukas Cavigelli, Luca Benini

For quantifying the overall system power, including I/O power, we built Vau da Muntanialas, to the best of our knowledge, the first demonstration of a systolic multi-chip-on-PCB array of RNN accelerator.

Quantization speech-recognition +2

Paper
Add Code

GVSoC: A Highly Configurable, Fast and Accurate Full-Platform Simulator for RISC-V based IoT Processors

1 code implementation • 20 Jan 2022 • Nazareno Bruschi, Germain Haugou, Giuseppe Tagliavini, Francesco Conti, Luca Benini, Davide Rossi

The last few years have seen the emergence of IoT processors: ultra-low power systems-on-chips (SoCs) combining lightweight and flexible micro-controller units (MCUs), often based on open-ISA RISC-V cores, with application-specific accelerators to maximize performance and energy efficiency.

Paper
Code

A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks

no code implementations • 4 Jan 2022 • Angelo Garofalo, Gianmarco Ottavi, Francesco Conti, Geethan Karunaratne, Irem Boybat, Luca Benini, Davide Rossi

Furthermore, we explore the requirements for end-to-end inference of a full mobile-grade DNN (MobileNetV2) in terms of IMC array resources, by scaling up our heterogeneous architecture to a multi-array accelerator.

Paper
Add Code

A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays

no code implementations • 20 Oct 2021 • Leonardo Ravaglia, Manuele Rusci, Davide Nadalini, Alessandro Capotondi, Francesco Conti, Luca Benini

In this work, we introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power (PULP) processor.

Continual Learning Quantization

Paper
Add Code

Vega: A 10-Core SoC for IoT End-Nodes with DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode

no code implementations • 18 Oct 2021 • Davide Rossi, Francesco Conti, Manuel Eggimann, Alfio Di Mauro, Giuseppe Tagliavini, Stefan Mach, Marco Guermandi, Antonio Pullini, Igor Loi, Jie Chen, Eric Flamand, Luca Benini

Vega achieves SoA-leading efficiency of 615 GOPS/W on 8-bit INT computation (boosted to 1. 3TOPS/W for 8-bit DNN inference with hardware acceleration).

Management

Paper
Add Code

Multiscale Anisotropic Harmonic Filters on non Euclidean domains

no code implementations • 1 Feb 2021 • Francesco Conti, Gaetano Scarano, Stefania Colonnese

This paper introduces Multiscale Anisotropic Harmonic Filters (MAHFs) aimed at extracting signal variations over non-Euclidean domains, namely 2D-Manifolds and their discrete representations, such as meshes and 3D Point Clouds as well as graphs.

Paper
Add Code

DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs

1 code implementation • 17 Aug 2020 • Alessio Burrello, Angelo Garofalo, Nazareno Bruschi, Giuseppe Tagliavini, Davide Rossi, Francesco Conti

In this work, we propose DORY (Deployment Oriented to memoRY) - an automatic tool to deploy DNNs on low cost MCUs with typically less than 1MB of on-chip SRAM memory.

C++ code Tiling & Deployment

Paper
Code

Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node

no code implementations • 17 Jul 2020 • Alfio Di Mauro, Francesco Conti, Pasquale Davide Schiavone, Davide Rossi, Luca Benini

On a prototype in 22nm FDX technology, we demonstrate that both the logic and SRAM voltage can be dropped to 0. 5Vwithout any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving energy efficiency by 2. 2X w. r. t.

PICO

Paper
Add Code

Enabling Mixed-Precision Quantized Neural Networks in Extreme-Edge Devices

2 code implementations • 15 Jul 2020 • Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, Davide Rossi

The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA).

Hardware Architecture Image and Video Processing

Paper
Code

Technical Report: NEMO DNN Quantization for Deployment Model

2 code implementations • 13 Apr 2020 • Francesco Conti

This technical report aims at defining a formal framework for Deep Neural Network (DNN) layer-wise quantization, focusing in particular on the problems related to the final deployment.

Quantization

Paper
Code

PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors

1 code implementation • 29 Aug 2019 • Angelo Garofalo, Manuele Rusci, Francesco Conti, Davide Rossi, Luca Benini

We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cluster of RISC-V processors.

Quantization

Paper
Code

An Open Source and Open Hardware Deep Learning-powered Visual Navigation Engine for Autonomous Nano-UAVs

2 code implementations • 10 May 2019 • Daniele Palossi, Francesco Conti, Luca Benini

Nano-size unmanned aerial vehicles (UAVs), with few centimeters of diameter and sub-10 Watts of total power budget, have so far been considered incapable of running sophisticated visual-based autonomous navigation software without external aid from base-stations, ad-hoc local positioning infrastructure, and powerful external computation servers.

Autonomous Navigation Visual Navigation

473

Paper
Code

Optimally Scheduling CNN Convolutions for Efficient Memory Access

no code implementations • 4 Feb 2019 • Arthur Stoutchinin, Francesco Conti, Luca Benini

Embedded inference engines for convolutional networks must be parsimonious in memory bandwidth and buffer sizing to meet power and cost constraints.

Scheduling

Paper
Add Code

XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference

1 code implementation • 9 Jul 2018 • Francesco Conti, Pasquale Davide Schiavone, Luca Benini

Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy.

Paper
Code

A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones

3 code implementations • 4 May 2018 • Daniele Palossi, Antonio Loquercio, Francesco Conti, Eric Flamand, Davide Scaramuzza, Luca Benini

As part of our general methodology we discuss the software mapping techniques that enable the state-of-the-art deep convolutional neural network presented in [1] to be fully executed on-board within a strict 6 fps real-time constraint with no compromise in terms of flight results, while all processing is done with only 64 mW on average.

Autonomous Navigation Visual Navigation

473

Paper
Code

NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs

no code implementations • 4 Dec 2017 • Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, Davide Rossi, Luigi Raffo, Luca Benini

Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

Chipmunk: A Systolically Scalable 0.9 mm${}^2$, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference

no code implementations • 15 Nov 2017 • Francesco Conti, Lukas Cavigelli, Gianna Paulin, Igor Susmelj, Luca Benini

Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

4 code implementations • 18 Dec 2016 • Francesco Conti, Robert Schilling, Pasquale Davide Schiavone, Antonio Pullini, Davide Rossi, Frank Kagan Gürkaynak, Michael Muehlberghuber, Michael Gautschi, Igor Loi, Germain Haugou, Stefan Mangard, Luca Benini

Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline.

EEG Face Detection +1

416

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.