Stochastic Modeling and Correlation Analysis of Omics Data

Budimir, Iva (2021) Stochastic Modeling and Correlation Analysis of Omics Data, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Fisica, 33 Ciclo. DOI 10.48676/unibo/amsdottorato/9792.

Salva citazione

Citato da

Documenti full-text disponibili:

Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (25MB)

Abstract

We studied the properties of three different types of omics data: protein domains in bacteria, gene length in metazoan genomes and methylation in humans. Gene elongation and protein domain diversification are some of the most important mechanisms in the evolution of functional complexity. For this reason, the investigation of the dynamic processes that led to their current configuration can highlight the important aspects of genome and proteome evolution and consequently of the evolution of living organisms. The potential of methylation to regulate the expression of genes is usually attributed to the groups of close CpG sites. We performed the correlation analysis to investigate the collaborative structure of all CpGs on chromosome 21. The long-tailed distributions of gene length and protein domain occurrences were successfully described by the stochastic evolutionary model and fitted with the Poisson Log-Normal distribution. This approach included both demographic and environmental stochasticity and the Gompertzian density regulation. The parameters of the fitted distributions were compared at the evolutionary scale. This allowed us to define a novel protein-domain-based phylogenetic method for bacteria which performed well at the intraspecies level. In the context of gene length distribution, we derived a new generalized population dynamics model for diverse subcommunities which allowed us to jointly model both coding and non-coding genomic sequences. A possible application of this approach is a method for differentiation between protein-coding genes and pseudogenes based on their length. General properties of the methylation correlation structure were firstly analyzed for the large data set of healthy controls and later compared to the Down syndrome (DS) data set. The CpGs demonstrated strong group behaviour even across the large genomic distances. Detected differences in DS were surprisingly small, possibly caused by the small sample size of DS which reduced the power of statistical analysis.

Abstract

Tipologia del documento

Tesi di dottorato

Autore

Budimir, Iva

Supervisore

Castellani, Gastone

Dottorato di ricerca

Fisica

Ciclo

Coordinatore

Cicoli, Michele

Settore disciplinare

Area 02 - Scienze fisiche > FIS/07 Fisica applicata (a beni culturali, ambientali, biologia e medicina)

Settore concorsuale

Area 02 - Scienze fisiche > 02/D - Fisica Applicata - Didattica e storia della fisica > 02/D1 - Fisica Applicata - Didattica e storia della fisica

Parole chiave

population dynamics; evolutionary model; species abundance distribution; long-tailed distribution; Poisson Log-Normal distribution; protein domain; bacteria; phylogeny; gene length; multimodal relative species abundance; DNA methylation; methylation correlation strucure; Down syndrome

URN:NBN

urn:nbn:it:unibo-27737

DOI

10.48676/unibo/amsdottorato/9792

Data di discussione

14 Maggio 2021

URI

https://amsdottorato.unibo.it/id/eprint/9792