Documenti full-text disponibili:
      
    
  
  
    
      Abstract
      Embedding intelligence in extreme edge devices allows distilling raw data acquired
from sensors into actionable information, directly on IoT end-nodes. This computing
paradigm, in which end-nodes no longer depend entirely on the Cloud, offers undeniable
benefits, driving a large research area (TinyML) to deploy leading Machine Learning
(ML) algorithms on micro-controller class of devices. To fit the limited memory storage capability of these tiny platforms, full-precision Deep Neural Networks (DNNs) are compressed by representing their data down to byte and sub-byte formats, in the integer domain. However, the current generation of micro-controller systems can barely cope with the computing requirements of QNNs. This thesis tackles the challenge from many perspectives, presenting solutions both at software and hardware levels, exploiting parallelism, heterogeneity and software programmability to guarantee high flexibility and high energy-performance proportionality. The first contribution, PULP-NN, is an optimized software computing library for QNN inference on parallel ultra-low-power (PULP) clusters of RISC-V processors, showing one order of magnitude improvements in performance and energy efficiency, compared to current State-of-the-Art (SoA) STM32 micro-controller systems (MCUs) based on ARM Cortex-M cores. The second contribution is XpulpNN, a set of RISC-V domain specific instruction set architecture (ISA) extensions to deal with sub-byte integer arithmetic computation. The solution, including the ISA extensions and the micro-architecture to support them, achieves energy efficiency comparable with dedicated DNN accelerators and surpasses the efficiency of SoA ARM Cortex-M based MCUs, such as the low-end STM32M4 and the high-end STM32H7 devices, by up to three orders of magnitude. To overcome the Von Neumann bottleneck while guaranteeing the highest flexibility, the final contribution integrates an Analog In-Memory Computing accelerator into the PULP cluster, creating a fully programmable heterogeneous fabric that demonstrates end-to-end inference capabilities of SoA MobileNetV2 models, showing two orders of magnitude performance improvements over current SoA analog/digital solutions.
     
    
      Abstract
      Embedding intelligence in extreme edge devices allows distilling raw data acquired
from sensors into actionable information, directly on IoT end-nodes. This computing
paradigm, in which end-nodes no longer depend entirely on the Cloud, offers undeniable
benefits, driving a large research area (TinyML) to deploy leading Machine Learning
(ML) algorithms on micro-controller class of devices. To fit the limited memory storage capability of these tiny platforms, full-precision Deep Neural Networks (DNNs) are compressed by representing their data down to byte and sub-byte formats, in the integer domain. However, the current generation of micro-controller systems can barely cope with the computing requirements of QNNs. This thesis tackles the challenge from many perspectives, presenting solutions both at software and hardware levels, exploiting parallelism, heterogeneity and software programmability to guarantee high flexibility and high energy-performance proportionality. The first contribution, PULP-NN, is an optimized software computing library for QNN inference on parallel ultra-low-power (PULP) clusters of RISC-V processors, showing one order of magnitude improvements in performance and energy efficiency, compared to current State-of-the-Art (SoA) STM32 micro-controller systems (MCUs) based on ARM Cortex-M cores. The second contribution is XpulpNN, a set of RISC-V domain specific instruction set architecture (ISA) extensions to deal with sub-byte integer arithmetic computation. The solution, including the ISA extensions and the micro-architecture to support them, achieves energy efficiency comparable with dedicated DNN accelerators and surpasses the efficiency of SoA ARM Cortex-M based MCUs, such as the low-end STM32M4 and the high-end STM32H7 devices, by up to three orders of magnitude. To overcome the Von Neumann bottleneck while guaranteeing the highest flexibility, the final contribution integrates an Analog In-Memory Computing accelerator into the PULP cluster, creating a fully programmable heterogeneous fabric that demonstrates end-to-end inference capabilities of SoA MobileNetV2 models, showing two orders of magnitude performance improvements over current SoA analog/digital solutions.
     
  
  
    
    
      Tipologia del documento
      Tesi di dottorato
      
      
      
      
        
      
        
          Autore
          Garofalo, Angelo
          
        
      
        
          Supervisore
          
          
        
      
        
      
        
          Dottorato di ricerca
          
          
        
      
        
      
        
          Ciclo
          34
          
        
      
        
          Coordinatore
          
          
        
      
        
          Settore disciplinare
          
          
        
      
        
          Settore concorsuale
          
          
        
      
        
          Parole chiave
          Embedded Systems, Heterogeneous Architectures, TinyML, Quantized Neural Networks, Ultra-Low-Power Systems, In-Memory computing, System on Chip, Parallel Computing Architecture, AI Acceleration, Deep Learning Acceleration, Internet-of-Things Multi-Core Systems, Extreme-Edge AI Acceleration
          
        
      
        
          URN:NBN
          
          
        
      
        
          DOI
          10.48676/unibo/amsdottorato/10288
          
        
      
        
          Data di discussione
          12 Luglio 2022
          
        
      
      URI
      
      
     
   
  
    Altri metadati
    
      Tipologia del documento
      Tesi di dottorato
      
      
      
      
        
      
        
          Autore
          Garofalo, Angelo
          
        
      
        
          Supervisore
          
          
        
      
        
      
        
          Dottorato di ricerca
          
          
        
      
        
      
        
          Ciclo
          34
          
        
      
        
          Coordinatore
          
          
        
      
        
          Settore disciplinare
          
          
        
      
        
          Settore concorsuale
          
          
        
      
        
          Parole chiave
          Embedded Systems, Heterogeneous Architectures, TinyML, Quantized Neural Networks, Ultra-Low-Power Systems, In-Memory computing, System on Chip, Parallel Computing Architecture, AI Acceleration, Deep Learning Acceleration, Internet-of-Things Multi-Core Systems, Extreme-Edge AI Acceleration
          
        
      
        
          URN:NBN
          
          
        
      
        
          DOI
          10.48676/unibo/amsdottorato/10288
          
        
      
        
          Data di discussione
          12 Luglio 2022
          
        
      
      URI
      
      
     
   
  
  
  
  
  
    
    Statistica sui download
    
    
  
  
    
      Gestione del documento: