Shukla, Priyank
  
(2010)
Machine learning methods for prediction of disulphide bonding states of cysteine residues in proteins, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. 
 Dottorato di ricerca in 
Informatica, 22 Ciclo. DOI 10.6092/unibo/amsdottorato/2588.
  
 
  
  
        
        
        
  
  
  
  
  
  
  
    
  
    
      Documenti full-text disponibili:
      
    
  
  
    
      Abstract
      The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding.
We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable.
Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it.
     
    
      Abstract
      The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding.
We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable.
Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it.
     
  
  
    
    
      Tipologia del documento
      Tesi di dottorato
      
      
      
      
        
      
        
          Autore
          Shukla, Priyank
          
        
      
        
          Supervisore
          
          
        
      
        
      
        
          Dottorato di ricerca
          
          
        
      
        
          Scuola di dottorato
          Scienze e ingegneria dell'informazione
          
        
      
        
          Ciclo
          22
          
        
      
        
          Coordinatore
          
          
        
      
        
          Settore disciplinare
          
          
        
      
        
          Settore concorsuale
          
          
        
      
        
          Parole chiave
          Bioinformatics, Machine Learning, Proteins, Disulfide bonds.
          
        
      
        
          URN:NBN
          
          
        
      
        
          DOI
          10.6092/unibo/amsdottorato/2588
          
        
      
        
          Data di discussione
          5 Maggio 2010
          
        
      
      URI
      
      
     
   
  
    Altri metadati
    
      Tipologia del documento
      Tesi di dottorato
      
      
      
      
        
      
        
          Autore
          Shukla, Priyank
          
        
      
        
          Supervisore
          
          
        
      
        
      
        
          Dottorato di ricerca
          
          
        
      
        
          Scuola di dottorato
          Scienze e ingegneria dell'informazione
          
        
      
        
          Ciclo
          22
          
        
      
        
          Coordinatore
          
          
        
      
        
          Settore disciplinare
          
          
        
      
        
          Settore concorsuale
          
          
        
      
        
          Parole chiave
          Bioinformatics, Machine Learning, Proteins, Disulfide bonds.
          
        
      
        
          URN:NBN
          
          
        
      
        
          DOI
          10.6092/unibo/amsdottorato/2588
          
        
      
        
          Data di discussione
          5 Maggio 2010
          
        
      
      URI
      
      
     
   
  
  
  
  
  
    
    Statistica sui download
    
    
  
  
    
      Gestione del documento: