Antici, Francesco
(2025)
Job-level online predictive modelling for sustainable HPC systems, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Computer science and engineering, 37 Ciclo.
Documenti full-text disponibili:
![phd-thesis.pdf [thumbnail of phd-thesis.pdf]](https://amsdottorato.unibo.it/style/images/fileicons/application_pdf.png) |
Documento PDF (English)
- Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (7MB)
|
Abstract
High-Performance Computing (HPC) systems are pivotal in addressing complex computational challenges across various scientific and industrial domains. However, their significant energy consumption and environmental impact pose critical sustainability challenges. One possible solution to tackle these challenges is the development of job level predictive models, with the aim of optimizing system throughput while minimizing environmental impact. While promising, these practices are currently not employed in HPC systems due to several important limitations which make them impractical in production environments. The research makes several contributions in addressing such limitations, and making job-level predictive modelling a practical solution towards more sustainable and efficient HPC environments. We first create two comprehensive datasets, namely PM100 and F-DATA, to overcome the scarcity of publicly available, fine-grained job-level data. These datasets facilitate in-depth analysis of job execution characteristics, such as power consumption and resource allocation, and serve as fundamental tools for job-level predictive modelling. Then, online ML-based predictive algorithms are developed to predict key job execution characteristics, including failure, power consumption, and memory/compute-bound nature. These models operate online, leveraging only submission-time features to infer the prediction into job scheduling and resource allocation decision-making. We employ our predictive models into frameworks, e.g. MCBound and UoPC, suitable for deployment in production environment. Such frameworks enable both system-level optimizations and end-user awareness, fostering improved end-user experience, performance and sustainability.
This work emphasizes the importance of job-level predictive modelling for sustainable HPC workload management. By addressing the limitations of such practices, our research contributes to the broader mission of sustainable computing, setting the stage for more environmentally conscious HPC systems.
Abstract
High-Performance Computing (HPC) systems are pivotal in addressing complex computational challenges across various scientific and industrial domains. However, their significant energy consumption and environmental impact pose critical sustainability challenges. One possible solution to tackle these challenges is the development of job level predictive models, with the aim of optimizing system throughput while minimizing environmental impact. While promising, these practices are currently not employed in HPC systems due to several important limitations which make them impractical in production environments. The research makes several contributions in addressing such limitations, and making job-level predictive modelling a practical solution towards more sustainable and efficient HPC environments. We first create two comprehensive datasets, namely PM100 and F-DATA, to overcome the scarcity of publicly available, fine-grained job-level data. These datasets facilitate in-depth analysis of job execution characteristics, such as power consumption and resource allocation, and serve as fundamental tools for job-level predictive modelling. Then, online ML-based predictive algorithms are developed to predict key job execution characteristics, including failure, power consumption, and memory/compute-bound nature. These models operate online, leveraging only submission-time features to infer the prediction into job scheduling and resource allocation decision-making. We employ our predictive models into frameworks, e.g. MCBound and UoPC, suitable for deployment in production environment. Such frameworks enable both system-level optimizations and end-user awareness, fostering improved end-user experience, performance and sustainability.
This work emphasizes the importance of job-level predictive modelling for sustainable HPC workload management. By addressing the limitations of such practices, our research contributes to the broader mission of sustainable computing, setting the stage for more environmentally conscious HPC systems.
Tipologia del documento
Tesi di dottorato
Autore
Antici, Francesco
Supervisore
Dottorato di ricerca
Ciclo
37
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Job-level predictive modelling;Machine Learning;Sustainable HPC systems;HPC workload
Data di discussione
3 Giugno 2025
URI
Altri metadati
Tipologia del documento
Tesi di dottorato
Autore
Antici, Francesco
Supervisore
Dottorato di ricerca
Ciclo
37
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Job-level predictive modelling;Machine Learning;Sustainable HPC systems;HPC workload
Data di discussione
3 Giugno 2025
URI
Statistica sui download
Gestione del documento: