A machine learning based method to detect genomic imbalances exploiting X chromosome exome reads

Dimartino, Paola (2022) A machine learning based method to detect genomic imbalances exploiting X chromosome exome reads, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Data science and computation, 33 Ciclo. DOI 10.48676/unibo/amsdottorato/10374.
Documenti full-text disponibili:
[img] Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Creative Commons Attribution Non-commercial No Derivatives 4.0 (CC BY-NC-ND 4.0) .
Download (8MB)


Whole Exome Sequencing (WES) is rapidly becoming the first-tier test in clinics, both thanks to its declining costs and the development of new platforms that help clinicians in the analysis and interpretation of SNV and InDels. However, we still know very little on how CNV detection could increase WES diagnostic yield. A plethora of exome CNV callers have been published over the years, all showing good performances towards specific CNV classes and sizes, suggesting that the combination of multiple tools is needed to obtain an overall good detection performance. Here we present TrainX, a ML-based method for calling heterozygous CNVs in WES data using EXCAVATOR2 Normalized Read Counts. We select males and females’ non pseudo-autosomal chromosome X alignments to construct our dataset and train our model, make predictions on autosomes target regions and use HMM to call CNVs. We compared TrainX against a set of CNV tools differing for the detection method (GATK4 gCNV, ExomeDepth, DECoN, CNVkit and EXCAVATOR2) and found that our algorithm outperformed them in terms of stability, as we identified both deletions and duplications with good scores (0.87 and 0.82 F1-scores respectively) and for sizes reaching the minimum resolution of 2 target regions. We also evaluated the method robustness using a set of WES and SNP array data (n=251), part of the Italian cohort of Epi25 collaborative, and were able to retrieve all clinical CNVs previously identified by the SNP array. TrainX showed good accuracy in detecting heterozygous CNVs of different sizes, making it a promising tool to use in a diagnostic setting.

Tipologia del documento
Tesi di dottorato
Dimartino, Paola
Dottorato di ricerca
Settore disciplinare
Settore concorsuale
Parole chiave
CNVs, WES, NGS, Machine Learning, EXCAVATOR2, Benchmark
Data di discussione
16 Giugno 2022

Altri metadati

Statistica sui download

Gestione del documento: Visualizza la tesi