Merlotti, Alessandra
(2021)
Characterization of DNA sequence properties through network and statistical approaches, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Fisica, 33 Ciclo. DOI 10.48676/unibo/amsdottorato/9848.
Documenti full-text disponibili:
|
Documento PDF (English)
- Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Salvo eventuali più ampie autorizzazioni dell'autore, la tesi può essere liberamente consultata e può essere effettuato il salvataggio e la stampa di una copia per fini strettamente personali di studio, di ricerca e di insegnamento, con espresso divieto di qualunque utilizzo direttamente o indirettamente commerciale. Ogni altro diritto sul materiale è riservato.
Download (17MB)
|
Abstract
In this thesis we will see that the DNA sequence is constantly shaped by the interactions with its environment at multiple levels, showing footprints of DNA methylation, of its 3D organization and, in the case of bacteria, of the interaction with the host organisms.
In the first chapter, we will see that analyzing the distribution of distances between consecutive dinucleotides of the same type along the sequence, we can detect epigenetic and structural footprints. In particular, we will see that CG distance distribution allows to distinguish among organisms of different biological complexity, depending on how much CG sites are involved in DNA methylation. Moreover, we will see that CG and TA can be described by the same fitting function, suggesting a relationship between the two. We will also provide an interpretation of the observed trend, simulating a positioning process guided by the presence and absence of memory. In the end, we will focus on TA distance distribution, characterizing deviations from the trend predicted by the best fitting function, and identifying specific patterns that might be related to peculiar mechanical properties of the DNA and also to epigenetic and structural processes.
In the second chapter, we will see how we can map the 3D structure of the DNA onto its sequence. In particular, we devised a network-based algorithm that produces a genome assembly starting from its 3D configuration, using as inputs Hi-C contact maps. Specifically, we will see how we can identify the different chromosomes and reconstruct their sequences by exploiting the spectral properties of the Laplacian operator of a network.
In the third chapter, we will see a novel method for source clustering and source attribution, based on a network approach, that allows to identify host-bacteria interaction starting from the detection of Single-Nucleotide Polymorphisms along the sequence of bacterial genomes.
Abstract
In this thesis we will see that the DNA sequence is constantly shaped by the interactions with its environment at multiple levels, showing footprints of DNA methylation, of its 3D organization and, in the case of bacteria, of the interaction with the host organisms.
In the first chapter, we will see that analyzing the distribution of distances between consecutive dinucleotides of the same type along the sequence, we can detect epigenetic and structural footprints. In particular, we will see that CG distance distribution allows to distinguish among organisms of different biological complexity, depending on how much CG sites are involved in DNA methylation. Moreover, we will see that CG and TA can be described by the same fitting function, suggesting a relationship between the two. We will also provide an interpretation of the observed trend, simulating a positioning process guided by the presence and absence of memory. In the end, we will focus on TA distance distribution, characterizing deviations from the trend predicted by the best fitting function, and identifying specific patterns that might be related to peculiar mechanical properties of the DNA and also to epigenetic and structural processes.
In the second chapter, we will see how we can map the 3D structure of the DNA onto its sequence. In particular, we devised a network-based algorithm that produces a genome assembly starting from its 3D configuration, using as inputs Hi-C contact maps. Specifically, we will see how we can identify the different chromosomes and reconstruct their sequences by exploiting the spectral properties of the Laplacian operator of a network.
In the third chapter, we will see a novel method for source clustering and source attribution, based on a network approach, that allows to identify host-bacteria interaction starting from the detection of Single-Nucleotide Polymorphisms along the sequence of bacterial genomes.
Tipologia del documento
Tesi di dottorato
Autore
Merlotti, Alessandra
Supervisore
Dottorato di ricerca
Ciclo
33
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
dinucleotide distance distributions; DNA sequence; DNA methylation; genome assembly; structural genomics; Hi-C data; Laplacian operator of a network; source clustering; source attribution; systems biology; network-based analysis
URN:NBN
DOI
10.48676/unibo/amsdottorato/9848
Data di discussione
14 Maggio 2021
URI
Altri metadati
Tipologia del documento
Tesi di dottorato
Autore
Merlotti, Alessandra
Supervisore
Dottorato di ricerca
Ciclo
33
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
dinucleotide distance distributions; DNA sequence; DNA methylation; genome assembly; structural genomics; Hi-C data; Laplacian operator of a network; source clustering; source attribution; systems biology; network-based analysis
URN:NBN
DOI
10.48676/unibo/amsdottorato/9848
Data di discussione
14 Maggio 2021
URI
Statistica sui download
Gestione del documento: