Automated copy number variation concordance analysis

Show full item record



Permalink

http://urn.fi/URN:NBN:fi:hulib-202107263427
Title: Automated copy number variation concordance analysis
Author: Thapa Magar, Purushottam
Contributor: University of Helsinki, Faculty of Science
Publisher: Helsingin yliopisto
Date: 2021
Language: eng
URI: http://urn.fi/URN:NBN:fi:hulib-202107263427
http://hdl.handle.net/10138/332598
Thesis level: master's thesis
Degree program: Tietojenkäsittelytieteen maisteriohjelma
Master's Programme in Computer Science
Magisterprogrammet i datavetenskap
Specialisation: Algoritmit
Algorithms
Algoritmer
Abstract: Rapid growth and advancement of next generation sequencing (NGS) technologies have changed the landscape of genomic medicine. Today, clinical laboratories perform DNA sequencing on a regular basis, which is an error prone process. Erroneous data affects downstream analysis and produces fallacious result. Therefore, external quality assessment (EQA) of laboratories working with NGS data is crucial. Validation of variations such as single nucleotide polymor- phism (SNP) and InDels (<50 bp) is fairly accurate these days. However, detection and quality assessment of large changes such as the copy number variation (CNV) continues to be a concern. In this work, we aimed to study the feasibility of an automated CNV concordance analysis for the laboratory EQA services. We benchmarked variants reported by 25 laboratories against the highly curated gold standard for the son (HG002/NA24385) of the askenazim trio from the Personal Genome Project published by the Genome in a Bottle Consortium (GIAB). We employed two methods to conduct concordance of CNVs, the sequence based comparison with Truvari and the in-house exome-based comparison. For deletion calls of two whole genome sequencing (WGS) submissions, Truvari gained a value greater than 88% and 68% for precision and recall respectively. Conversely, the in-house method’s precision and recall score peaked at 39% and 7.9% respectively for one WGS submission for both deletion and duplication calls. The results indicate that automated CNV concordance analysis of the deletion calls for the WGS-based callset might be feasible with Truvari. On the other hand, results for panel-based targeted sequencing for the deletion calls showed precision and recall rates ranging from 0-80% and 0-5.6% respectively with Truvari. The result suggests that automated concordance analysis of CNVs for targeted sequencing remains a challenge. In conclusion, CNV concordance analysis depends on how the sequence data is generated.
Subject: next generation sequencing
copy number variation
structural variation
benchmarking NGS data


Files in this item

Total number of downloads: Loading...

Files Size Format View
Thapa_Magar_Purushottam_Thesis_2021.pdf 1.054Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record