Dallari, Silvia
(2025)
Statistical modelling of heterogeneity in the human gut microbiome, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Scienze statistiche, 37 Ciclo.
Documenti full-text disponibili:
![Dallari_Silvia_tesi.pdf [thumbnail of Dallari_Silvia_tesi.pdf]](https://amsdottorato.unibo.it/style/images/fileicons/application_pdf.png) |
Documento PDF (English)
- Accesso riservato fino a 1 Marzo 2027
- Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (3MB)
| Contatta l'autore
|
Abstract
Research on the gut microbiome is becoming essential for understanding human health and the role of the microbiota on biological systems and disorders, emphasizing the importance of clustering individuals according to their microbiome configuration. This thesis explores different strategies to obtain these groups. In the first part we focus on clustering methods that exploit the individual profile information, proposing two models generally applicable to count data. The former is a Poisson generalized linear latent variable mixture model which identifies clusters of samples sharing the same features and simultaneously allows for the correlation between variables. The second proposal aims to extend multivariate Poisson-Lognormal (MPLN) mixtures to the high-dimensional setting. To do that, we start focusing on the general Gaussian mixture model framework, proposing a Random Projection Ensemble covariance estimate. Then, the application of the proposal to the latent layer of MPLN mixtures is investigated. In the second part the focus moves to diversity-based methods. Here we develop a Bray-Curtis dissimilarity-based mixture model, with the possibility of adding the α-diversity as covariate. Concurrently, we propose a two-step algorithm which clusters samples according to their within-individual diversity, and then it further divides them based on the inter-individual variability. Afterwards, individual profiles and diversity measures are combined in a cosine distance-based mixture model with covariates. In the third part, literature methods and the new approaches are applied to simulated and real microbiome data. The latter show how results depend on the method and distance used. Considering metadata, body mass index, gender and age often turned out to be among the most important variables in explaining the gut microbiome composition, while the type of diet among the least important, maybe for an overly simplistic categorization. However, the metadata analysed appear to explain a very small portion of the differences in the microbiome configurations found.
Abstract
Research on the gut microbiome is becoming essential for understanding human health and the role of the microbiota on biological systems and disorders, emphasizing the importance of clustering individuals according to their microbiome configuration. This thesis explores different strategies to obtain these groups. In the first part we focus on clustering methods that exploit the individual profile information, proposing two models generally applicable to count data. The former is a Poisson generalized linear latent variable mixture model which identifies clusters of samples sharing the same features and simultaneously allows for the correlation between variables. The second proposal aims to extend multivariate Poisson-Lognormal (MPLN) mixtures to the high-dimensional setting. To do that, we start focusing on the general Gaussian mixture model framework, proposing a Random Projection Ensemble covariance estimate. Then, the application of the proposal to the latent layer of MPLN mixtures is investigated. In the second part the focus moves to diversity-based methods. Here we develop a Bray-Curtis dissimilarity-based mixture model, with the possibility of adding the α-diversity as covariate. Concurrently, we propose a two-step algorithm which clusters samples according to their within-individual diversity, and then it further divides them based on the inter-individual variability. Afterwards, individual profiles and diversity measures are combined in a cosine distance-based mixture model with covariates. In the third part, literature methods and the new approaches are applied to simulated and real microbiome data. The latter show how results depend on the method and distance used. Considering metadata, body mass index, gender and age often turned out to be among the most important variables in explaining the gut microbiome composition, while the type of diet among the least important, maybe for an overly simplistic categorization. However, the metadata analysed appear to explain a very small portion of the differences in the microbiome configurations found.
Tipologia del documento
Tesi di dottorato
Autore
Dallari, Silvia
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
37
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Model-based clustering, Generalized linear latent variable models, Random projections, High-dimensional covariance estimation, Compositional count data, Microbiome data, Diversity-based methods
Data di discussione
14 Aprile 2025
URI
Altri metadati
Tipologia del documento
Tesi di dottorato
Autore
Dallari, Silvia
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
37
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Model-based clustering, Generalized linear latent variable models, Random projections, High-dimensional covariance estimation, Compositional count data, Microbiome data, Diversity-based methods
Data di discussione
14 Aprile 2025
URI
Gestione del documento: