Statistical modelling of spatio-temporal dependencies in NGS data

Ranciati, Saverio (2016) Statistical modelling of spatio-temporal dependencies in NGS data, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Scienze statistiche, 28 Ciclo. DOI 10.6092/unibo/amsdottorato/7680.
Documenti full-text disponibili:
Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Download (2MB) | Anteprima


Next-generation sequencing (NGS) has rapidly become the current standard in genetic related analysis. This switch from microarray to NGS required new statistical strategies to address the research questions inherent to the considered phenomena. First and foremost, NGS dataset usually consist of discrete observations characterized by overdispersion - that is, discrepancy between expected and observed variability - and an abundance of zeros, measured across a huge number of regions of the genome. With respect to chromatin immunoprecipitation sequencing (ChIP-Seq), a class of NGS data, it is of primary focus to discover the underlying (unobserved) pattern of `enrichment': more particularly, there is interest in the interactions between genes (or broader regions of the genome) and proteins, as they describe the mechanism of regulation under different conditions such as healthy or damaged tissue. Another interesting research question involves the clustering of these observations into groups that have practical relevance and interpretability, considering in particular that a single unit could potentially be allocated into more than one of these clusters, as it is reasonable to assume that its participation is not exclusive to one and only biological function and/or mechanism. Many of these complex processes, indeed, could also be described by sets of ordinary differential equations (ODE's), which are mathematical representations of the changes of a system through time, following a dynamic that is governed by some parameters we are interested in. In this thesis, we address the aforementioned tasks and research questions employing different statistical strategies, such as model-based clustering, graphical models, penalized smoothing and regression. We propose extensions of the existing approaches to better fit the problem at hand and we elaborate the methodology in a Bayesian environment, with the focus on incorporating the structural dependencies - both spatial and temporal - of the data at our disposal.

Tipologia del documento
Tesi di dottorato
Ranciati, Saverio
Dottorato di ricerca
Scuola di dottorato
Scienze economiche e statistiche
Settore disciplinare
Settore concorsuale
Parole chiave
model-based clustering, mixture model, graphical model, markov random field, next-generation sequencing, ordinary differential equations, spline smoothing, penalized regression, multiple allocations, overdispersion, negative binomial
Data di discussione
10 Giugno 2016

Altri metadati

Statistica sui download

Gestione del documento: Visualizza la tesi