Dimartino, Paola
  
(2022)
A machine learning based method to detect genomic imbalances exploiting X chromosome exome reads, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. 
 Dottorato di ricerca in 
Data science and computation, 33 Ciclo. DOI 10.48676/unibo/amsdottorato/10374.
  
 
  
  
        
        
        
  
  
  
  
  
  
  
    
  
    
      Documenti full-text disponibili:
      
    
  
  
    
      Abstract
      Whole Exome Sequencing (WES) is rapidly becoming the first-tier test in clinics, both thanks to its 
declining costs and the development of new platforms that help clinicians in the analysis and
interpretation of SNV and InDels. However, we still know very little on how CNV detection could 
increase WES diagnostic yield. A plethora of exome CNV callers have been published over the years, 
all showing good performances towards specific CNV classes and sizes, suggesting that the 
combination of multiple tools is needed to obtain an overall good detection performance. Here we 
present TrainX, a ML-based method for calling heterozygous CNVs in WES data using EXCAVATOR2
Normalized Read Counts. We select males and females’ non pseudo-autosomal chromosome X 
alignments to construct our dataset and train our model, make predictions on autosomes target 
regions and use HMM to call CNVs. We compared TrainX against a set of CNV tools differing for the
detection method (GATK4 gCNV, ExomeDepth, DECoN, CNVkit and EXCAVATOR2) and found that 
our algorithm outperformed them in terms of stability, as we identified both deletions and 
duplications with good scores (0.87 and 0.82 F1-scores respectively) and for sizes reaching the 
minimum resolution of 2 target regions. We also evaluated the method robustness using a set of 
WES and SNP array data (n=251), part of the Italian cohort of Epi25 collaborative, and were able to 
retrieve all clinical CNVs previously identified by the SNP array. TrainX showed good accuracy in 
detecting heterozygous CNVs of different sizes, making it a promising tool to use in a diagnostic 
setting.
     
    
      Abstract
      Whole Exome Sequencing (WES) is rapidly becoming the first-tier test in clinics, both thanks to its 
declining costs and the development of new platforms that help clinicians in the analysis and
interpretation of SNV and InDels. However, we still know very little on how CNV detection could 
increase WES diagnostic yield. A plethora of exome CNV callers have been published over the years, 
all showing good performances towards specific CNV classes and sizes, suggesting that the 
combination of multiple tools is needed to obtain an overall good detection performance. Here we 
present TrainX, a ML-based method for calling heterozygous CNVs in WES data using EXCAVATOR2
Normalized Read Counts. We select males and females’ non pseudo-autosomal chromosome X 
alignments to construct our dataset and train our model, make predictions on autosomes target 
regions and use HMM to call CNVs. We compared TrainX against a set of CNV tools differing for the
detection method (GATK4 gCNV, ExomeDepth, DECoN, CNVkit and EXCAVATOR2) and found that 
our algorithm outperformed them in terms of stability, as we identified both deletions and 
duplications with good scores (0.87 and 0.82 F1-scores respectively) and for sizes reaching the 
minimum resolution of 2 target regions. We also evaluated the method robustness using a set of 
WES and SNP array data (n=251), part of the Italian cohort of Epi25 collaborative, and were able to 
retrieve all clinical CNVs previously identified by the SNP array. TrainX showed good accuracy in 
detecting heterozygous CNVs of different sizes, making it a promising tool to use in a diagnostic 
setting.
     
  
  
    
    
      Tipologia del documento
      Tesi di dottorato
      
      
      
      
        
      
        
          Autore
          Dimartino, Paola
          
        
      
        
          Supervisore
          
          
        
      
        
          Co-supervisore
          
          
        
      
        
          Dottorato di ricerca
          
          
        
      
        
      
        
          Ciclo
          33
          
        
      
        
          Coordinatore
          
          
        
      
        
          Settore disciplinare
          
          
        
      
        
          Settore concorsuale
          
          
        
      
        
          Parole chiave
          CNVs, WES, NGS, Machine Learning, EXCAVATOR2, Benchmark
          
        
      
        
          URN:NBN
          
          
        
      
        
          DOI
          10.48676/unibo/amsdottorato/10374
          
        
      
        
          Data di discussione
          16 Giugno 2022
          
        
      
      URI
      
      
     
   
  
    Altri metadati
    
      Tipologia del documento
      Tesi di dottorato
      
      
      
      
        
      
        
          Autore
          Dimartino, Paola
          
        
      
        
          Supervisore
          
          
        
      
        
          Co-supervisore
          
          
        
      
        
          Dottorato di ricerca
          
          
        
      
        
      
        
          Ciclo
          33
          
        
      
        
          Coordinatore
          
          
        
      
        
          Settore disciplinare
          
          
        
      
        
          Settore concorsuale
          
          
        
      
        
          Parole chiave
          CNVs, WES, NGS, Machine Learning, EXCAVATOR2, Benchmark
          
        
      
        
          URN:NBN
          
          
        
      
        
          DOI
          10.48676/unibo/amsdottorato/10374
          
        
      
        
          Data di discussione
          16 Giugno 2022
          
        
      
      URI
      
      
     
   
  
  
  
  
  
    
    Statistica sui download
    
    
  
  
    
      Gestione del documento: