Properzi, Enrico
(2013)
Genome characterization through a mathematical model of the genetic code: an analysis of the whole chromosome 1 of A. thaliana, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Metodologia statistica per la ricerca scientifica, 25 Ciclo. DOI 10.6092/unibo/amsdottorato/5164.
Documenti full-text disponibili:
Abstract
The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code.
I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes.
The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them.
Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes.
I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”.
The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism.
No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.
Abstract
The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code.
I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes.
The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them.
Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes.
I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”.
The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism.
No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.
Tipologia del documento
Tesi di dottorato
Autore
Properzi, Enrico
Supervisore
Co-supervisore
Dottorato di ricerca
Scuola di dottorato
Scienze economiche e statistiche
Ciclo
25
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Arabidopsis thaliana, dichotomic classes, dependence structure, genome, mathematical model, entropy measures
URN:NBN
DOI
10.6092/unibo/amsdottorato/5164
Data di discussione
18 Febbraio 2013
URI
Altri metadati
Tipologia del documento
Tesi di dottorato
Autore
Properzi, Enrico
Supervisore
Co-supervisore
Dottorato di ricerca
Scuola di dottorato
Scienze economiche e statistiche
Ciclo
25
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
Arabidopsis thaliana, dichotomic classes, dependence structure, genome, mathematical model, entropy measures
URN:NBN
DOI
10.6092/unibo/amsdottorato/5164
Data di discussione
18 Febbraio 2013
URI
Statistica sui download
Gestione del documento: