Dimartino, Paola
(2022)
A machine learning based method to detect genomic imbalances exploiting X chromosome exome reads, [Dissertation thesis], Alma Mater Studiorum Università di Bologna.
Dottorato di ricerca in
Data science and computation, 33 Ciclo. DOI 10.48676/unibo/amsdottorato/10374.
Documenti full-text disponibili:
Abstract
Whole Exome Sequencing (WES) is rapidly becoming the first-tier test in clinics, both thanks to its
declining costs and the development of new platforms that help clinicians in the analysis and
interpretation of SNV and InDels. However, we still know very little on how CNV detection could
increase WES diagnostic yield. A plethora of exome CNV callers have been published over the years,
all showing good performances towards specific CNV classes and sizes, suggesting that the
combination of multiple tools is needed to obtain an overall good detection performance. Here we
present TrainX, a ML-based method for calling heterozygous CNVs in WES data using EXCAVATOR2
Normalized Read Counts. We select males and females’ non pseudo-autosomal chromosome X
alignments to construct our dataset and train our model, make predictions on autosomes target
regions and use HMM to call CNVs. We compared TrainX against a set of CNV tools differing for the
detection method (GATK4 gCNV, ExomeDepth, DECoN, CNVkit and EXCAVATOR2) and found that
our algorithm outperformed them in terms of stability, as we identified both deletions and
duplications with good scores (0.87 and 0.82 F1-scores respectively) and for sizes reaching the
minimum resolution of 2 target regions. We also evaluated the method robustness using a set of
WES and SNP array data (n=251), part of the Italian cohort of Epi25 collaborative, and were able to
retrieve all clinical CNVs previously identified by the SNP array. TrainX showed good accuracy in
detecting heterozygous CNVs of different sizes, making it a promising tool to use in a diagnostic
setting.
Abstract
Whole Exome Sequencing (WES) is rapidly becoming the first-tier test in clinics, both thanks to its
declining costs and the development of new platforms that help clinicians in the analysis and
interpretation of SNV and InDels. However, we still know very little on how CNV detection could
increase WES diagnostic yield. A plethora of exome CNV callers have been published over the years,
all showing good performances towards specific CNV classes and sizes, suggesting that the
combination of multiple tools is needed to obtain an overall good detection performance. Here we
present TrainX, a ML-based method for calling heterozygous CNVs in WES data using EXCAVATOR2
Normalized Read Counts. We select males and females’ non pseudo-autosomal chromosome X
alignments to construct our dataset and train our model, make predictions on autosomes target
regions and use HMM to call CNVs. We compared TrainX against a set of CNV tools differing for the
detection method (GATK4 gCNV, ExomeDepth, DECoN, CNVkit and EXCAVATOR2) and found that
our algorithm outperformed them in terms of stability, as we identified both deletions and
duplications with good scores (0.87 and 0.82 F1-scores respectively) and for sizes reaching the
minimum resolution of 2 target regions. We also evaluated the method robustness using a set of
WES and SNP array data (n=251), part of the Italian cohort of Epi25 collaborative, and were able to
retrieve all clinical CNVs previously identified by the SNP array. TrainX showed good accuracy in
detecting heterozygous CNVs of different sizes, making it a promising tool to use in a diagnostic
setting.
Tipologia del documento
Tesi di dottorato
Autore
Dimartino, Paola
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
33
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
CNVs, WES, NGS, Machine Learning, EXCAVATOR2, Benchmark
URN:NBN
DOI
10.48676/unibo/amsdottorato/10374
Data di discussione
16 Giugno 2022
URI
Altri metadati
Tipologia del documento
Tesi di dottorato
Autore
Dimartino, Paola
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
33
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
CNVs, WES, NGS, Machine Learning, EXCAVATOR2, Benchmark
URN:NBN
DOI
10.48676/unibo/amsdottorato/10374
Data di discussione
16 Giugno 2022
URI
Statistica sui download
Gestione del documento: