Towards pan-genome read alignment to improve variation calling

Näytä kaikki kuvailutiedot



Pysyväisosoite

http://hdl.handle.net/10138/235453

Lähdeviite

Valenzuela , D , Norri , T , Välimäki , N , Pitkänen , E & Mäkinen , V 2018 , ' Towards pan-genome read alignment to improve variation calling ' , BMC Genomics , vol. 19 , 87 . https://doi.org/10.1186/s12864-018-4465-8

Julkaisun nimi: Towards pan-genome read alignment to improve variation calling
Tekijä: Valenzuela, Daniel; Norri, Tuukka; Välimäki, Niko; Pitkänen, Esa; Mäkinen, Veli
Tekijän organisaatio: Helsinki Institute for Information Technology
Genome-scale Algorithmics research group / Veli Mäkinen
Department of Computer Science
Research Programs Unit
Lauri Antti Aaltonen / Principal Investigator
Genome-Scale Biology (GSB) Research Program
Medicum
Department of Medical and Clinical Genetics
Algorithmic Bioinformatics
Päiväys: 2018-05-09
Kieli: eng
Sivumäärä: 8
Kuuluu julkaisusarjaan: BMC Genomics
ISSN: 1471-2164
DOI-tunniste: https://doi.org/10.1186/s12864-018-4465-8
URI: http://hdl.handle.net/10138/235453
Tiivistelmä: Background: Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation. Results: We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation - a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC. Conclusions: Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions.
Avainsanat: Pan-genome reference
Variation calling
Read alignment
BURROWS-WHEELER TRANSFORM
GENETIC-VARIATION
COLLECTIONS
INFERENCE
PROJECT
3111 Biomedicine
113 Computer and information sciences
Vertaisarvioitu: Kyllä
Tekijänoikeustiedot: cc_by
Pääsyrajoitteet: openAccess
Rinnakkaistallennettu versio: publishedVersion


Tiedostot

Latausmäärä yhteensä: Ladataan...

Tiedosto(t) Koko Formaatti Näytä
s12864_018_4465_8.pdf 751.6KB PDF Avaa tiedosto

Viite kuuluu kokoelmiin:

Näytä kaikki kuvailutiedot