Giommi, Luca
(2023)
Machine learning as a service for high energy physics (MLaaS4HEP): a service for ML-based data analyses, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Data science and computation, 34 Ciclo. DOI 10.48676/unibo/amsdottorato/10586.
Documenti full-text disponibili:
|
Documento PDF (English)
- Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (18MB)
|
Abstract
With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community.
The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner.
Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services.
Abstract
With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community.
The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner.
Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services.
Tipologia del documento
Tesi di dottorato
Autore
Giommi, Luca
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
34
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Machine Learning, ML, cloud computing, High Energy Physics, HEP, MLaaS, Machine Learning as a Service, cloud, data science, data analysis
URN:NBN
DOI
10.48676/unibo/amsdottorato/10586
Data di discussione
29 Marzo 2023
URI
Altri metadati
Tipologia del documento
Tesi di dottorato
Autore
Giommi, Luca
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
34
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Machine Learning, ML, cloud computing, High Energy Physics, HEP, MLaaS, Machine Learning as a Service, cloud, data science, data analysis
URN:NBN
DOI
10.48676/unibo/amsdottorato/10586
Data di discussione
29 Marzo 2023
URI
Statistica sui download
Gestione del documento: