Scaling performance at the end of moore's law: a programmer's perspective

Ficarelli, Federico (2025) Scaling performance at the end of moore's law: a programmer's perspective, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Data science and computation, 36 Ciclo. DOI 10.48676/unibo/amsdottorato/11919.
Documenti full-text disponibili:
[thumbnail of ficarelli.pdf] Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Creative Commons: Attribuzione - Non Commerciale - Non Opere Derivate 4.0 (CC BY-NC-ND 4.0) .
Download (12MB)

Abstract

Computer architectures face a fundamental shift as Moore's law and Dennard scaling reach their technological limits. This evolution sparked a Cambrian explosion of specialized hardware designs: computing is now a power-bound challenge. While applications still struggle to scale on exascale systems, future HPC systems must integrate an increasingly diverse spectrum of accelerators, while software stacks must adapt to heterogeneous platforms. The need for domain-specific features is driving the advent of the RISC-V architecture: its flexible ISA could be the answer to the evolutionary challenges faced by computing. The adoption of RISC-V in HPC is still uncharted territory, bringing new challenges for system integration and software stacks. This thesis focuses on three ideas. Embarrassingly parallel, task-based workloads must explore throughput-optimized GPU kernel designs to unlock drug discovery campaigns on current TOP500 systems. HPC systems must overcome design and integration challenges to prepare for increasingly diverse post-exascale clusters, where RISC-V could be an answer. The hardware/software interface must adapt: target-specific components of the compilation stack must evolve to sustain domain-specific code generation. The first part of this thesis involves implementing and scaling drug discovery simulations on GPU-accelerated TOP500 systems, focusing on efficient acceleration of task-based workloads that scale to trillions of molecules, enabling the largest drug discovery simulation for SARS-CoV-2 ever performed. The second part centers on designing, building, and evaluating Monte Cimone, the world's first RISC-V HPC production cluster: its successful deployment proves the production readiness of RISC-V for HPC, paving the way for future RISC-V supercomputers. The third part focuses on the collective endeavor of developing an MLIR-based compiler backend for Snitch, a novel RISC-V streaming accelerator for machine learning, applying a progressive lowering approach to the compiler backend and enabling efficient micro-kernel code generation for application-specific RISC-V accelerators.

Abstract
Tipologia del documento
Tesi di dottorato
Autore
Ficarelli, Federico
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
36
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
HPC, GPU, drug discovery, exascale, compilers, MLIR, RISC-V, CUDA, benchmarking, machine learning accelerators
DOI
10.48676/unibo/amsdottorato/11919
Data di discussione
26 Marzo 2025
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza la tesi

^