Characterization of DNA sequence properties through network and statistical approaches

Merlotti, Alessandra (2021) Characterization of DNA sequence properties through network and statistical approaches, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Fisica, 33 Ciclo. DOI 10.48676/unibo/amsdottorato/9848.

Salva citazione

Citato da

Documenti full-text disponibili:

Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (17MB)

Abstract

In this thesis we will see that the DNA sequence is constantly shaped by the interactions with its environment at multiple levels, showing footprints of DNA methylation, of its 3D organization and, in the case of bacteria, of the interaction with the host organisms. In the first chapter, we will see that analyzing the distribution of distances between consecutive dinucleotides of the same type along the sequence, we can detect epigenetic and structural footprints. In particular, we will see that CG distance distribution allows to distinguish among organisms of different biological complexity, depending on how much CG sites are involved in DNA methylation. Moreover, we will see that CG and TA can be described by the same fitting function, suggesting a relationship between the two. We will also provide an interpretation of the observed trend, simulating a positioning process guided by the presence and absence of memory. In the end, we will focus on TA distance distribution, characterizing deviations from the trend predicted by the best fitting function, and identifying specific patterns that might be related to peculiar mechanical properties of the DNA and also to epigenetic and structural processes. In the second chapter, we will see how we can map the 3D structure of the DNA onto its sequence. In particular, we devised a network-based algorithm that produces a genome assembly starting from its 3D configuration, using as inputs Hi-C contact maps. Specifically, we will see how we can identify the different chromosomes and reconstruct their sequences by exploiting the spectral properties of the Laplacian operator of a network. In the third chapter, we will see a novel method for source clustering and source attribution, based on a network approach, that allows to identify host-bacteria interaction starting from the detection of Single-Nucleotide Polymorphisms along the sequence of bacterial genomes.

Abstract

Tipologia del documento

Tesi di dottorato

Autore

Merlotti, Alessandra

Supervisore

Remondini, Daniel

Dottorato di ricerca

Fisica

Ciclo

Coordinatore

Cicoli, Michele

Settore disciplinare

Area 02 - Scienze fisiche > FIS/07 Fisica applicata (a beni culturali, ambientali, biologia e medicina)

Settore concorsuale

Area 02 - Scienze fisiche > 02/D - Fisica Applicata - Didattica e storia della fisica > 02/D1 - Fisica Applicata - Didattica e storia della fisica

Parole chiave

dinucleotide distance distributions; DNA sequence; DNA methylation; genome assembly; structural genomics; Hi-C data; Laplacian operator of a network; source clustering; source attribution; systems biology; network-based analysis

URN:NBN

urn:nbn:it:unibo-27757

DOI

10.48676/unibo/amsdottorato/9848

Data di discussione

14 Maggio 2021

URI

http://amsdottorato.unibo.it/id/eprint/9848