Beraha, Mario
(2023)
Statistical learning of random probability measures, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Data science and computation, 34 Ciclo. DOI 10.48676/unibo/amsdottorato/10607.
Documenti full-text disponibili:
Abstract
The study of random probability measures is a lively research topic that has
attracted interest from different fields in recent years. In this thesis, we consider
random probability measures in the context of Bayesian nonparametrics,
where the law of a random probability measure is used as prior distribution,
and in the context of distributional data analysis, where
the goal is to perform inference given avsample from the law of a random probability measure.
The contributions contained in this thesis can be subdivided according to three
different topics: (i) the use of almost surely discrete repulsive random measures
(i.e., whose support points are well separated) for Bayesian model-based
clustering, (ii) the proposal of new laws for collections of random probability
measures for Bayesian density estimation of partially
exchangeable data subdivided into different groups, and (iii) the study
of principal component analysis and regression models for probability distributions
seen as elements of the 2-Wasserstein space. Specifically, for point
(i) above we propose an efficient Markov chain Monte Carlo algorithm for
posterior inference, which sidesteps the need of split-merge reversible jump
moves typically associated with poor performance, we propose a model for
clustering high-dimensional data by introducing a novel class of anisotropic
determinantal point processes, and study the distributional properties of the
repulsive measures, shedding light on important theoretical results which enable
more principled prior elicitation and more efficient posterior simulation
algorithms. For point (ii) above, we consider several models suitable for clustering
homogeneous populations, inducing spatial dependence across groups of
data, extracting the characteristic traits common to all the data-groups, and
propose a novel vector autoregressive model to study of growth
curves of Singaporean kids. Finally, for point (iii), we propose a novel class of
projected statistical methods for distributional data analysis for measures
on the real line and on the unit-circle.
Abstract
The study of random probability measures is a lively research topic that has
attracted interest from different fields in recent years. In this thesis, we consider
random probability measures in the context of Bayesian nonparametrics,
where the law of a random probability measure is used as prior distribution,
and in the context of distributional data analysis, where
the goal is to perform inference given avsample from the law of a random probability measure.
The contributions contained in this thesis can be subdivided according to three
different topics: (i) the use of almost surely discrete repulsive random measures
(i.e., whose support points are well separated) for Bayesian model-based
clustering, (ii) the proposal of new laws for collections of random probability
measures for Bayesian density estimation of partially
exchangeable data subdivided into different groups, and (iii) the study
of principal component analysis and regression models for probability distributions
seen as elements of the 2-Wasserstein space. Specifically, for point
(i) above we propose an efficient Markov chain Monte Carlo algorithm for
posterior inference, which sidesteps the need of split-merge reversible jump
moves typically associated with poor performance, we propose a model for
clustering high-dimensional data by introducing a novel class of anisotropic
determinantal point processes, and study the distributional properties of the
repulsive measures, shedding light on important theoretical results which enable
more principled prior elicitation and more efficient posterior simulation
algorithms. For point (ii) above, we consider several models suitable for clustering
homogeneous populations, inducing spatial dependence across groups of
data, extracting the characteristic traits common to all the data-groups, and
propose a novel vector autoregressive model to study of growth
curves of Singaporean kids. Finally, for point (iii), we propose a novel class of
projected statistical methods for distributional data analysis for measures
on the real line and on the unit-circle.
Tipologia del documento
Tesi di dottorato
Autore
Beraha, Mario
Supervisore
Dottorato di ricerca
Ciclo
34
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
bayesian nonparametrics; random measures; hierarchical processes; Wasserstein distance; clustering
URN:NBN
DOI
10.48676/unibo/amsdottorato/10607
Data di discussione
29 Marzo 2023
URI
Altri metadati
Tipologia del documento
Tesi di dottorato
Autore
Beraha, Mario
Supervisore
Dottorato di ricerca
Ciclo
34
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
bayesian nonparametrics; random measures; hierarchical processes; Wasserstein distance; clustering
URN:NBN
DOI
10.48676/unibo/amsdottorato/10607
Data di discussione
29 Marzo 2023
URI
Statistica sui download
Gestione del documento: