Search Results for author: Francesco Conti

Found 32 papers, 16 papers with code

Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems

1 code implementation17 Apr 2024 Luca Bompani, Manuele Rusci, Daniele Palossi, Francesco Conti, Luca Benini

This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors.

Object object-detection +1

Optimizing the Deployment of Tiny Transformers on Low-Power MCUs

1 code implementation3 Apr 2024 Victor J. B. Jung, Alessio Burrello, Moritz Scherer, Francesco Conti, Luca Benini

Moreover, we show that our MHSA depth-first tiling scheme reduces the memory peak by up to 6. 19x, while the fused-weight attention can reduce the runtime by 1. 53x, and number of parameters by 25%.

Hand Gesture Recognition Hand-Gesture Recognition

A Topological Machine Learning Pipeline for Classification

no code implementations26 Sep 2023 Francesco Conti, Davide Moroni, Maria Antonietta Pascali

In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered.

Classification

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

no code implementations6 Jul 2023 Georg Rutishauser, Francesco Conti, Luca Benini

Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization.

Navigate Quantization

Reduced Precision Floating-Point Optimization for Deep Neural Network On-Device Learning on MicroControllers

1 code implementation30 May 2023 Davide Nadalini, Manuele Rusci, Luca Benini, Francesco Conti

Enabling On-Device Learning (ODL) for Ultra-Low-Power Micro-Controller Units (MCUs) is a key step for post-deployment adaptation and fine-tuning of Deep Neural Network (DNN) models in future TinyML applications.

Continual Learning Image Classification +1

Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC with 2-to-8b DNN Acceleration and 30%-Boost Adaptive Body Biasing

1 code implementation15 May 2023 Francesco Conti, Gianna Paulin, Angelo Garofalo, Davide Rossi, Alfio Di Mauro, Georg Rutishauser, Gianmarco Ottavi, Manuel Eggimann, Hayate Okuhara, Luca Benini

We present Marsellus, an all-digital heterogeneous SoC for AI-IoT end-nodes fabricated in GlobalFoundries 22nm FDX that combines 1) a general-purpose cluster of 16 RISC-V Digital Signal Processing (DSP) cores attuned for the execution of a diverse range of workloads exploiting 4-bit and 2-bit arithmetic extensions (XpulpNN), combined with fused MAC&LOAD operations and floating-point support; 2) a 2-8bit Reconfigurable Binary Engine (RBE) to accelerate 3x3 and 1x1 (pointwise) convolutions in DNNs; 3) a set of On-Chip Monitoring (OCM) blocks connected to an Adaptive Body Biasing (ABB) generator and a hardware control loop, enabling on-the-fly adaptation of transistor threshold voltages.

Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space

no code implementations15 Mar 2023 Michael Rogenmoser, Yvan Tortorella, Davide Rossi, Francesco Conti, Luca Benini

To mitigate the overheads of traditional radiation hardening and modular redundancy approaches, we present a novel Hybrid Modular Redundancy (HMR) approach, a redundancy scheme that features a cluster of RISC-V processors with a flexible on-demand dual-core and triple-core lockstep grouping of computing cores with runtime split-lock capabilities.

Lightweight Neural Architecture Search for Temporal Convolutional Networks at the Edge

1 code implementation24 Jan 2023 Matteo Risso, Alessio Burrello, Francesco Conti, Lorenzo Lamberti, Yukai Chen, Luca Benini, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari

Neural Architecture Search (NAS) is quickly becoming the go-to approach to optimize the structure of Deep Learning (DL) models for complex tasks such as Image Classification or Object Detection.

Image Classification Neural Architecture Search +4

RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible and Energy-Efficient On-Chip Linear Algebra and TinyML Training Acceleration

1 code implementation10 Jan 2023 Yvan Tortorella, Luca Bertaccini, Luca Benini, Davide Rossi, Francesco Conti

The increasing interest in TinyML, i. e., near-sensor machine learning on power budgets of a few tens of mW, is currently pushing toward enabling TinyML-class training as opposed to inference only.

Vau da muntanialas: Energy-efficient multi-die scalable acceleration of RNN inference

no code implementations14 Feb 2022 Gianna Paulin, Francesco Conti, Lukas Cavigelli, Luca Benini

For quantifying the overall system power, including I/O power, we built Vau da Muntanialas, to the best of our knowledge, the first demonstration of a systolic multi-chip-on-PCB array of RNN accelerator.

Quantization speech-recognition +2

GVSoC: A Highly Configurable, Fast and Accurate Full-Platform Simulator for RISC-V based IoT Processors

1 code implementation20 Jan 2022 Nazareno Bruschi, Germain Haugou, Giuseppe Tagliavini, Francesco Conti, Luca Benini, Davide Rossi

The last few years have seen the emergence of IoT processors: ultra-low power systems-on-chips (SoCs) combining lightweight and flexible micro-controller units (MCUs), often based on open-ISA RISC-V cores, with application-specific accelerators to maximize performance and energy efficiency.

A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks

no code implementations4 Jan 2022 Angelo Garofalo, Gianmarco Ottavi, Francesco Conti, Geethan Karunaratne, Irem Boybat, Luca Benini, Davide Rossi

Furthermore, we explore the requirements for end-to-end inference of a full mobile-grade DNN (MobileNetV2) in terms of IMC array resources, by scaling up our heterogeneous architecture to a multi-array accelerator.

A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays

no code implementations20 Oct 2021 Leonardo Ravaglia, Manuele Rusci, Davide Nadalini, Alessandro Capotondi, Francesco Conti, Luca Benini

In this work, we introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power (PULP) processor.

Continual Learning Quantization

Multiscale Anisotropic Harmonic Filters on non Euclidean domains

no code implementations1 Feb 2021 Francesco Conti, Gaetano Scarano, Stefania Colonnese

This paper introduces Multiscale Anisotropic Harmonic Filters (MAHFs) aimed at extracting signal variations over non-Euclidean domains, namely 2D-Manifolds and their discrete representations, such as meshes and 3D Point Clouds as well as graphs.

DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs

1 code implementation17 Aug 2020 Alessio Burrello, Angelo Garofalo, Nazareno Bruschi, Giuseppe Tagliavini, Davide Rossi, Francesco Conti

In this work, we propose DORY (Deployment Oriented to memoRY) - an automatic tool to deploy DNNs on low cost MCUs with typically less than 1MB of on-chip SRAM memory.

C++ code Tiling & Deployment

Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node

no code implementations17 Jul 2020 Alfio Di Mauro, Francesco Conti, Pasquale Davide Schiavone, Davide Rossi, Luca Benini

On a prototype in 22nm FDX technology, we demonstrate that both the logic and SRAM voltage can be dropped to 0. 5Vwithout any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving energy efficiency by 2. 2X w. r. t.

PICO

Enabling Mixed-Precision Quantized Neural Networks in Extreme-Edge Devices

2 code implementations15 Jul 2020 Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, Davide Rossi

The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA).

Hardware Architecture Image and Video Processing

Technical Report: NEMO DNN Quantization for Deployment Model

2 code implementations13 Apr 2020 Francesco Conti

This technical report aims at defining a formal framework for Deep Neural Network (DNN) layer-wise quantization, focusing in particular on the problems related to the final deployment.

Quantization

PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors

1 code implementation29 Aug 2019 Angelo Garofalo, Manuele Rusci, Francesco Conti, Davide Rossi, Luca Benini

We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cluster of RISC-V processors.

Quantization

An Open Source and Open Hardware Deep Learning-powered Visual Navigation Engine for Autonomous Nano-UAVs

2 code implementations10 May 2019 Daniele Palossi, Francesco Conti, Luca Benini

Nano-size unmanned aerial vehicles (UAVs), with few centimeters of diameter and sub-10 Watts of total power budget, have so far been considered incapable of running sophisticated visual-based autonomous navigation software without external aid from base-stations, ad-hoc local positioning infrastructure, and powerful external computation servers.

Autonomous Navigation Visual Navigation

Optimally Scheduling CNN Convolutions for Efficient Memory Access

no code implementations4 Feb 2019 Arthur Stoutchinin, Francesco Conti, Luca Benini

Embedded inference engines for convolutional networks must be parsimonious in memory bandwidth and buffer sizing to meet power and cost constraints.

Scheduling

XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference

1 code implementation9 Jul 2018 Francesco Conti, Pasquale Davide Schiavone, Luca Benini

Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy.

A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones

3 code implementations4 May 2018 Daniele Palossi, Antonio Loquercio, Francesco Conti, Eric Flamand, Davide Scaramuzza, Luca Benini

As part of our general methodology we discuss the software mapping techniques that enable the state-of-the-art deep convolutional neural network presented in [1] to be fully executed on-board within a strict 6 fps real-time constraint with no compromise in terms of flight results, while all processing is done with only 64 mW on average.

Autonomous Navigation Visual Navigation

An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

4 code implementations18 Dec 2016 Francesco Conti, Robert Schilling, Pasquale Davide Schiavone, Antonio Pullini, Davide Rossi, Frank Kagan Gürkaynak, Michael Muehlberghuber, Michael Gautschi, Igor Loi, Germain Haugou, Stefan Mangard, Luca Benini

Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline.

EEG Face Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.