Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs

Show full item record



Holden , L A , Arumilli , M , Hytonen , M K , Hundi , S , Salojärvi , J , Brown , K H & Lohi , H 2018 , ' Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs ' , Scientific Reports , vol. 8 , 10862 .

Title: Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs
Author: Holden, Lindsay A.; Arumilli, Meharji; Hytonen, Marjo K.; Hundi, Sruthi; Salojärvi, Jarkko; Brown, Kim H.; Lohi, Hannes
Contributor organization: Research Programs Unit
Hannes Tapani Lohi / Principal Investigator
Veterinary Biosciences
Research Programme for Molecular Neurology
Veterinary Genetics
Organismal and Evolutionary Biology Research Programme
Viikki Plant Science Centre (ViPS)
Bioinformatics for Molecular Biology and Genomics (BMBG)
Date: 2018-07-18
Language: eng
Number of pages: 11
Belongs to series: Scientific Reports
ISSN: 2045-2322
Abstract: Dogs are excellent animal models for human disease. They have extensive veterinary histories, pedigrees, and a unique genetic system due to breeding practices. Despite these advantages, one factor limiting their usefulness is the canine genome reference (CGR) which was assembled using a single purebred Boxer. Although a common practice, this results in many high-quality reads remaining unmapped. To address this whole-genome sequence data from three breeds, Border Collie (n = 26), Bearded Collie (n = 7), and Entlebucher Sennenhund (n = 8), were analyzed to identify novel, non-CGR genomic contigs using the previously validated pseudo-de novo assembly pipeline. We identified 256,957 novel contigs and paired-end relationships together with BLAT scores provided 126,555 (49%) high-quality contigs with genomic coordinates containing 4.6 Mb of novel sequence absent from the CGR. These contigs close 12,503 known gaps, including 2.4 Mb containing partially missing sequences for 11.5% of Ensembl, 16.4% of RefSeq and 12.2% of canFam3.1+ CGR annotated genes and 1,748 unmapped contigs containing 2,366 novel gene variants. Examples for six disease-associated genes (SCARF2, RD3, COL9A3, FAM161A, RASGRP1 and DLX6) containing gaps or alternate splice variants missing from the CGR are also presented. These findings from non-reference breeds support the need for improvement of the current Boxer-only CGR to avoid missing important biological information. The inclusion of the missing gene sequences into the CGR will facilitate identification of putative disease mutations across diverse breeds and phenotypes.
Description: Correction Volume: 8, Article Number: 11853 DOI: 10.1038/s41598-018-30169-3 Published:AUG 2 2018
1184 Genetics, developmental biology, physiology
413 Veterinary science
Peer reviewed: Yes
Rights: cc_by
Usage restriction: openAccess
Self-archived version: publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
s41598_018_29190_3.pdf 2.102Mb PDF View/Open
s41598_018_30169_3.pdf 668.2Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record