Giuliani, Luca
(2025)
Detection and enforcement of non-linear correlations for fair and robust machine learning applications, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Computer science and engineering, 37 Ciclo. DOI 10.48676/unibo/amsdottorato/12027.
Documenti full-text disponibili:
Abstract
Detecting correlations is crucial in several Machine Learning tasks, such as the identification of patterns or the enforcement of certain relational constraints. In the realm of algorithmic fairness, correlations are particularly significant, as indicators typically quantify the degree of dependence between a sensitive input attribute and a target variable. Nonetheless, traditional measures have been focusing solely on categorical protected attributes due to technical limitations, thus neglecting continuous sensitive information like age, income, degree of disability, or other aggregated numerical variables. To overcome these limitations, recent research has suggested using the Hirschfeld–Gebelein–Rényi (HGR) correlation coefficient as a measure of fairness. HGR is an extension of Pearson's coefficient able to detect non-linear correlations by employing two mapping functions called copula transformations; in this dissertation, we present a novel computational approach for estimating it by means of user-defined kernel functions parameterized through a vector of mixing coefficients. Our approach is deterministic, offers increased robustness, improves interpretability compared to existing methods, and features other advantageous properties that make it more trustworthy for practical applications. We demonstrate its benefits over other computational techniques in both synthetic data and real-world benchmarks; then, following a minor variation of the HGR semantics, we introduce the Generalized Disparate Impact (GeDI) indicator, which broadens the legal notion of disparate impact to continuous input variables. Empirical findings confirm that this indicator can effectively reduce unfairness across three benchmark datasets, as well as in a practical use case involving long-term fairness in ranking systems; moreover, we show how both measures can be brought into a unified framework, and are equivalent up to a data-dependent scaling factor. To conclude, we discuss ongoing and future works regarding both methodological extensions of our Kernel-Based HGR method and potential applications in intersectional fairness and causal discovery.
Abstract
Detecting correlations is crucial in several Machine Learning tasks, such as the identification of patterns or the enforcement of certain relational constraints. In the realm of algorithmic fairness, correlations are particularly significant, as indicators typically quantify the degree of dependence between a sensitive input attribute and a target variable. Nonetheless, traditional measures have been focusing solely on categorical protected attributes due to technical limitations, thus neglecting continuous sensitive information like age, income, degree of disability, or other aggregated numerical variables. To overcome these limitations, recent research has suggested using the Hirschfeld–Gebelein–Rényi (HGR) correlation coefficient as a measure of fairness. HGR is an extension of Pearson's coefficient able to detect non-linear correlations by employing two mapping functions called copula transformations; in this dissertation, we present a novel computational approach for estimating it by means of user-defined kernel functions parameterized through a vector of mixing coefficients. Our approach is deterministic, offers increased robustness, improves interpretability compared to existing methods, and features other advantageous properties that make it more trustworthy for practical applications. We demonstrate its benefits over other computational techniques in both synthetic data and real-world benchmarks; then, following a minor variation of the HGR semantics, we introduce the Generalized Disparate Impact (GeDI) indicator, which broadens the legal notion of disparate impact to continuous input variables. Empirical findings confirm that this indicator can effectively reduce unfairness across three benchmark datasets, as well as in a practical use case involving long-term fairness in ranking systems; moreover, we show how both measures can be brought into a unified framework, and are equivalent up to a data-dependent scaling factor. To conclude, we discuss ongoing and future works regarding both methodological extensions of our Kernel-Based HGR method and potential applications in intersectional fairness and causal discovery.
Tipologia del documento
Tesi di dottorato
Autore
Giuliani, Luca
Supervisore
Dottorato di ricerca
Ciclo
37
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Non-Linear Correlations,
Polynomial Regression,
Algorithmic Fairness,
Machine Learning,
Fair Machine Learning,
Robust Machine Learning,
Constrained Machine Learning
DOI
10.48676/unibo/amsdottorato/12027
Data di discussione
9 Aprile 2025
URI
Altri metadati
Tipologia del documento
Tesi di dottorato
Autore
Giuliani, Luca
Supervisore
Dottorato di ricerca
Ciclo
37
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Non-Linear Correlations,
Polynomial Regression,
Algorithmic Fairness,
Machine Learning,
Fair Machine Learning,
Robust Machine Learning,
Constrained Machine Learning
DOI
10.48676/unibo/amsdottorato/12027
Data di discussione
9 Aprile 2025
URI
Statistica sui download
Gestione del documento: