Towards pan-genome read alignment to improve variation calling

Show simple item record

dc.contributor.author Valenzuela, Daniel
dc.contributor.author Norri, Tuukka
dc.contributor.author Välimäki, Niko
dc.contributor.author Pitkänen, Esa
dc.contributor.author Mäkinen, Veli
dc.date.accessioned 2018-05-31T11:19:01Z
dc.date.available 2018-05-31T11:19:01Z
dc.date.issued 2018-05-09
dc.identifier.citation Valenzuela , D , Norri , T , Välimäki , N , Pitkänen , E & Mäkinen , V 2018 , ' Towards pan-genome read alignment to improve variation calling ' , BMC Genomics , vol. 19 , 87 . https://doi.org/10.1186/s12864-018-4465-8
dc.identifier.other PURE: 107167840
dc.identifier.other PURE UUID: ef3c415e-3438-4132-bf7e-0ec94213baa4
dc.identifier.other WOS: 000431831100011
dc.identifier.other Scopus: 85046624769
dc.identifier.other ORCID: /0000-0003-4454-1493/work/45438699
dc.identifier.other ORCID: /0000-0002-8276-0585/work/105287114
dc.identifier.uri http://hdl.handle.net/10138/235453
dc.description.abstract Background: Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation. Results: We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation - a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC. Conclusions: Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions. en
dc.format.extent 8
dc.language.iso eng
dc.relation.ispartof BMC Genomics
dc.rights cc_by
dc.rights.uri info:eu-repo/semantics/openAccess
dc.subject Pan-genome reference
dc.subject Variation calling
dc.subject Read alignment
dc.subject BURROWS-WHEELER TRANSFORM
dc.subject GENETIC-VARIATION
dc.subject COLLECTIONS
dc.subject INFERENCE
dc.subject PROJECT
dc.subject 3111 Biomedicine
dc.subject 113 Computer and information sciences
dc.title Towards pan-genome read alignment to improve variation calling en
dc.type Article
dc.contributor.organization Helsinki Institute for Information Technology
dc.contributor.organization Genome-scale Algorithmics research group / Veli Mäkinen
dc.contributor.organization Department of Computer Science
dc.contributor.organization Research Programs Unit
dc.contributor.organization Lauri Antti Aaltonen / Principal Investigator
dc.contributor.organization Genome-Scale Biology (GSB) Research Program
dc.contributor.organization Medicum
dc.contributor.organization Department of Medical and Clinical Genetics
dc.contributor.organization Algorithmic Bioinformatics
dc.description.reviewstatus Peer reviewed
dc.relation.doi https://doi.org/10.1186/s12864-018-4465-8
dc.relation.issn 1471-2164
dc.rights.accesslevel openAccess
dc.type.version publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
s12864_018_4465_8.pdf 751.6Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record