Unbiased probabilistic taxonomic classification for DNA barcoding

Show full item record




Somervuo , P , Koskela , S , Pennanen , J , Nilsson , R H & Ovaskainen , O 2016 , ' Unbiased probabilistic taxonomic classification for DNA barcoding ' , Bioinformatics , vol. 32 , no. 19 , pp. 2920-2927 . https://doi.org/10.1093/bioinformatics/btw346

Title: Unbiased probabilistic taxonomic classification for DNA barcoding
Author: Somervuo, Panu; Koskela, Sonja; Pennanen, Juho; Nilsson, R. Henrik; Ovaskainen, Otso
Contributor organization: Biosciences
Centre of Excellence in Metapopulation Research
Otso Ovaskainen / Principal Investigator
Date: 2016-10-01
Language: eng
Number of pages: 8
Belongs to series: Bioinformatics
ISSN: 1367-4803
DOI: https://doi.org/10.1093/bioinformatics/btw346
URI: http://hdl.handle.net/10138/201578
Abstract: Motivation: When targeted to a barcoding region, high-throughput sequencing can be used to identify species or operational taxonomical units from environmental samples, and thus to study the diversity and structure of species communities. Although there are many methods which provide confidence scores for assigning taxonomic affiliations, it is not straightforward to translate these values to unbiased probabilities. We present a probabilistic method for taxonomical classification (PROTAX) of DNA sequences. Given a pre-defined taxonomical tree structure that is partially populated by reference sequences, PROTAX decomposes the probability of one to the set of all possible outcomes. PROTAX accounts for species that are present in the taxonomy but that do not have reference sequences, the possibility of unknown taxonomical units, as well as mislabeled reference sequences. PROTAX is based on a statistical multinomial regression model, and it can utilize any kind of sequence similarity measures or the outputs of other classifiers as predictors. Results: We demonstrate the performance of PROTAX by using as predictors the output from BLAST, the phylogenetic classification software TIPP, and the RDP classifier. We show that PROTAX improves the predictions of the baseline implementations of TIPP and RDP classifiers, and that it is able to combine complementary information provided by BLAST and TIPP, resulting in accurate and unbiased classifications even with very challenging cases such as 50% mislabeling of reference sequences.
1182 Biochemistry, cell and molecular biology
Peer reviewed: Yes
Usage restriction: openAccess
Self-archived version: acceptedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
manuscript2016.pdf 727.7Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record