Scalable optimization-based Scheduling approaches for HPC facilities

Bridi, Thomas (2018) Scalable optimization-based Scheduling approaches for HPC facilities, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Computer science and engineering, 30 Ciclo. DOI 10.6092/unibo/amsdottorato/8436.

Salva citazione

Citato da

Documenti full-text disponibili:

Anteprima

Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Creative Commons Attribution Non-commercial Share Alike 3.0 (CC BY-NC-SA 3.0) .
Download (3MB) | Anteprima

Abstract

This Thesis deals with the problem of scheduling applications on High-Performance Computing (HPC) machines. The goal is to create a scheduler that can improve the solutions w.r.t. the state-of-the-art under different metrics. However, improving the solution quality is not enough: creating a scheduler for future HPC machines requires to take into account also overheads and scalability. In this thesis we present a comprehensive, scalable, scheduling approach that features both an off-line and an on-line component. The off-line component is based on Constraint Programming (CP), an optimization technique that is well-suited for scheduling problems and allows for great flexibility. We leverage this flexibility to present first a optimization method designed to optimize the job waiting times, which is then extended via heuristics and search strategies to deal with more complex objective functions. Unfortunately, such a complex objective function cannot be handled by a solver in an acceptable amount of time for online operation on a HPC machine in-production. We deal with this difficulty by making use of a second, distributed, on-line scheduler. This second scheduler is designed to dramatically decrease the computational overhead and achieve a scalability adequate to future ExaFlops HPC machines. The distributed scheduler is proactive, and it takes decisions so as to follow a desirable, pre-specified, utilization profile. This feature makes it possible to connect these two schedulers to create a hybrid system: the CP component computes the scheduling on a trace of forecasted jobs one day ahead, machine learning techniques extract from the solution a near-optimal and desirable utilization profile, and the online scheduler takes care of the actual scheduling decisions in a scalable fashion. The resulting architecture manages to improve the HPC machine profit by an average 8.6%, while decreasing the computational overhead and, under normal conditions, without any side effect.

Abstract

Tipologia del documento

Tesi di dottorato

Autore

Bridi, Thomas

Supervisore

Milano, Michela

Dottorato di ricerca

Computer science and engineering

Ciclo

Coordinatore

Ciaccia, Paolo