LAF : Logic Alignment Free and its application to bacterial genomes classification

Show full item record



Permalink

http://hdl.handle.net/10138/161919

Citation

Weitschek , E , Cunial , F & Felici , G 2015 , ' LAF : Logic Alignment Free and its application to bacterial genomes classification ' BioData mining , vol. 8 , 39 . DOI: 10.1186/s13040-015-0073-1

Title: LAF : Logic Alignment Free and its application to bacterial genomes classification
Author: Weitschek, Emanuel; Cunial, Fabio; Felici, Giovanni
Contributor: University of Helsinki, Department of Computer Science
Date: 2015-12-08
Language: eng
Number of pages: 13
Belongs to series: BioData mining
ISSN: 1756-0381
URI: http://hdl.handle.net/10138/161919
Abstract: Alignment-free algorithms can be used to estimate the similarity of biological sequences and hence are often applied to the phylogenetic reconstruction of genomes. Most of these algorithms rely on comparing the frequency of all the distinct substrings of fixed length (k-mers) that occur in the analyzed sequences. In this paper, we present Logic Alignment Free (LAF), a method that combines alignment-free techniques and rule-based classification algorithms in order to assign biological samples to their taxa. This method searches for a minimal subset of k-mers whose relative frequencies are used to build classification models as disjunctive-normal-form logic formulas (if-then rules). We apply LAF successfully to the classification of bacterial genomes to their corresponding taxonomy. In particular, we succeed in obtaining reliable classification at different taxonomic levels by extracting a handful of rules, each one based on the frequency of just few k-mers. State of the art methods to adjust the frequency of k-mers to the character distribution of the underlying genomes have negligible impact on classification performance, suggesting that the signal of each class is strong and that LAF is effective in identifying it.
Subject: Supervised classification
Alignment-free sequence comparison
Bacterial taxonomy
MULTIPLE SEQUENCE ALIGNMENT
PROTEIN SEQUENCES
EVOLUTIONARY IMPLICATIONS
DEOXYRIBONUCLEIC-ACID
ENZYMATIC-SYNTHESIS
FREQUENCY-ANALYSIS
BINNING ALGORITHM
WHOLE GENOMES
K-MERS
DNA
113 Computer and information sciences
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
art_3A10.1186_2Fs13040_015_0073_1.pdf 679.8Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record