Bridi, Thomas
(2018)
Scalable optimization-based Scheduling approaches for HPC facilities, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Computer science and engineering, 30 Ciclo. DOI 10.6092/unibo/amsdottorato/8436.
Documenti full-text disponibili:
Abstract
This Thesis deals with the problem of scheduling applications on High-Performance Computing (HPC) machines. The goal is to create a scheduler that can improve the solutions w.r.t. the state-of-the-art under different metrics. However, improving the solution quality is not enough: creating a scheduler for future HPC machines requires to take into account also overheads and scalability. In this thesis we present a comprehensive, scalable, scheduling approach that features both an off-line and an on-line component. The off-line component is based on Constraint Programming (CP), an optimization technique that is well-suited for scheduling problems and allows for great flexibility. We leverage this flexibility to present first a optimization method designed to optimize the job waiting times, which is then extended via heuristics and search strategies to deal with more complex objective functions.
Unfortunately, such a complex objective function cannot be handled by a solver in an acceptable amount of time for online operation on a HPC machine in-production. We deal with this difficulty by making use of a second, distributed, on-line scheduler. This second scheduler is designed to dramatically decrease the computational overhead and achieve a scalability adequate to future ExaFlops HPC machines.
The distributed scheduler is proactive, and it takes decisions so as to follow a desirable, pre-specified, utilization profile. This feature makes it possible to connect these two schedulers to create a hybrid system: the CP component computes the scheduling on a trace of forecasted jobs one day ahead, machine learning techniques extract from the solution a near-optimal and desirable utilization profile, and the online scheduler takes care of the actual scheduling decisions in a scalable fashion.
The resulting architecture manages to improve the HPC machine profit by an average 8.6%, while decreasing the computational overhead and, under normal conditions, without any side effect.
Abstract
This Thesis deals with the problem of scheduling applications on High-Performance Computing (HPC) machines. The goal is to create a scheduler that can improve the solutions w.r.t. the state-of-the-art under different metrics. However, improving the solution quality is not enough: creating a scheduler for future HPC machines requires to take into account also overheads and scalability. In this thesis we present a comprehensive, scalable, scheduling approach that features both an off-line and an on-line component. The off-line component is based on Constraint Programming (CP), an optimization technique that is well-suited for scheduling problems and allows for great flexibility. We leverage this flexibility to present first a optimization method designed to optimize the job waiting times, which is then extended via heuristics and search strategies to deal with more complex objective functions.
Unfortunately, such a complex objective function cannot be handled by a solver in an acceptable amount of time for online operation on a HPC machine in-production. We deal with this difficulty by making use of a second, distributed, on-line scheduler. This second scheduler is designed to dramatically decrease the computational overhead and achieve a scalability adequate to future ExaFlops HPC machines.
The distributed scheduler is proactive, and it takes decisions so as to follow a desirable, pre-specified, utilization profile. This feature makes it possible to connect these two schedulers to create a hybrid system: the CP component computes the scheduling on a trace of forecasted jobs one day ahead, machine learning techniques extract from the solution a near-optimal and desirable utilization profile, and the online scheduler takes care of the actual scheduling decisions in a scalable fashion.
The resulting architecture manages to improve the HPC machine profit by an average 8.6%, while decreasing the computational overhead and, under normal conditions, without any side effect.
Tipologia del documento
Tesi di dottorato
Autore
Bridi, Thomas
Supervisore
Dottorato di ricerca
Ciclo
30
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
HPC, Scheduling, Optimization, Constraint Programming, Supercomputing
URN:NBN
DOI
10.6092/unibo/amsdottorato/8436
Data di discussione
20 Aprile 2018
URI
Altri metadati
Tipologia del documento
Tesi di dottorato
Autore
Bridi, Thomas
Supervisore
Dottorato di ricerca
Ciclo
30
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
HPC, Scheduling, Optimization, Constraint Programming, Supercomputing
URN:NBN
DOI
10.6092/unibo/amsdottorato/8436
Data di discussione
20 Aprile 2018
URI
Statistica sui download
Gestione del documento: