Flexible Computing Systems For AI Acceleration At The Extreme Edge Of The IoT

Garofalo, Angelo (2022) Flexible Computing Systems For AI Acceleration At The Extreme Edge Of The IoT, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Ingegneria elettronica, telecomunicazioni e tecnologie dell'informazione, 34 Ciclo. DOI 10.48676/unibo/amsdottorato/10288.

Salva citazione

Citato da

Documenti full-text disponibili:

Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Creative Commons Attribution Non-commercial No Derivatives 4.0 (CC BY-NC-ND 4.0) .
Download (7MB)

Abstract

Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors into actionable information, directly on IoT end-nodes. This computing paradigm, in which end-nodes no longer depend entirely on the Cloud, offers undeniable benefits, driving a large research area (TinyML) to deploy leading Machine Learning (ML) algorithms on micro-controller class of devices. To fit the limited memory storage capability of these tiny platforms, full-precision Deep Neural Networks (DNNs) are compressed by representing their data down to byte and sub-byte formats, in the integer domain. However, the current generation of micro-controller systems can barely cope with the computing requirements of QNNs. This thesis tackles the challenge from many perspectives, presenting solutions both at software and hardware levels, exploiting parallelism, heterogeneity and software programmability to guarantee high flexibility and high energy-performance proportionality. The first contribution, PULP-NN, is an optimized software computing library for QNN inference on parallel ultra-low-power (PULP) clusters of RISC-V processors, showing one order of magnitude improvements in performance and energy efficiency, compared to current State-of-the-Art (SoA) STM32 micro-controller systems (MCUs) based on ARM Cortex-M cores. The second contribution is XpulpNN, a set of RISC-V domain specific instruction set architecture (ISA) extensions to deal with sub-byte integer arithmetic computation. The solution, including the ISA extensions and the micro-architecture to support them, achieves energy efficiency comparable with dedicated DNN accelerators and surpasses the efficiency of SoA ARM Cortex-M based MCUs, such as the low-end STM32M4 and the high-end STM32H7 devices, by up to three orders of magnitude. To overcome the Von Neumann bottleneck while guaranteeing the highest flexibility, the final contribution integrates an Analog In-Memory Computing accelerator into the PULP cluster, creating a fully programmable heterogeneous fabric that demonstrates end-to-end inference capabilities of SoA MobileNetV2 models, showing two orders of magnitude performance improvements over current SoA analog/digital solutions.

Abstract

Tipologia del documento

Tesi di dottorato

Autore

Garofalo, Angelo

Supervisore

Benini, Luca

Dottorato di ricerca

Ingegneria elettronica, telecomunicazioni e tecnologie dell'informazione

Ciclo

Coordinatore

Romani, Aldo

Settore disciplinare

Area 09 - Ingegneria industriale e dell'informazione > ING-INF/01 Elettronica

Settore concorsuale

Area 09 - Ingegneria industriale e dell'informazione > 09/E - Ingegneria elettrica, elettronica e misure > 09/E3 Elettronica

Parole chiave

Embedded Systems, Heterogeneous Architectures, TinyML, Quantized Neural Networks, Ultra-Low-Power Systems, In-Memory computing, System on Chip, Parallel Computing Architecture, AI Acceleration, Deep Learning Acceleration, Internet-of-Things Multi-Core Systems, Extreme-Edge AI Acceleration

URN:NBN

urn:nbn:it:unibo-28583

DOI

10.48676/unibo/amsdottorato/10288

Data di discussione

12 Luglio 2022

URI

http://amsdottorato.unibo.it/id/eprint/10288