Pattern Discovery in Colored Strings

Show full item record



Permalink

http://hdl.handle.net/10138/340721

Citation

Lipták , Z , Puglisi , S J & Rossi , M 2020 , Pattern Discovery in Colored Strings . in S Faro & D Cantone (eds) , 18th International Symposium on Experimental Algorithms (SEA 2020) . , 12 , Leibniz International Proceedings in Informatics, LIPIcs , vol. 160 , Schloss Dagstuhl - Leibniz-Zentrum für Informatik , Dagstuhl , International Symposium on Experimental Algorithms , Catania , Italy , 16/06/2020 . https://doi.org/10.4230/LIPIcs.SEA.2020.12

Title: Pattern Discovery in Colored Strings
Author: Lipták, Zsuzsanna; Puglisi, Simon J.; Rossi, Massimiliano
Other contributor: Faro, Simone
Cantone, Domenico
Contributor organization: Helsinki Institute for Information Technology
Department of Computer Science
Bioinformatics
Algorithmic Bioinformatics
Publisher: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Date: 2020-06-01
Language: eng
Number of pages: 14
Belongs to series: 18th International Symposium on Experimental Algorithms (SEA 2020)
Belongs to series: Leibniz International Proceedings in Informatics, LIPIcs
ISBN: 978-3-95977-148-1
ISSN: 1868-8969
DOI: https://doi.org/10.4230/LIPIcs.SEA.2020.12
URI: http://hdl.handle.net/10138/340721
Abstract: We consider the problem of identifying patterns of interest in colored strings. A colored string is a string in which each position is colored with one of a finite set of colors. Our task is to find substrings that always occur followed by the same color at the same distance. The problem is motivated by applications in embedded systems verification, in particular, assertion mining. The goal there is to automatically infer properties of the embedded system from the analysis of its simulation traces. We show that the number of interesting patterns is upper-bounded by O(n2) where n is the length of the string. We introduce a baseline algorithm with O(n2) running time which identifies all interesting patterns for all colors in the string satisfying certain minimality conditions. When one is interested in patterns related to only one color, we provide an algorithm that identifies patterns in O(n2 log n) time, but is faster than the first algorithm in practice, both on simulated and on real-world patterns. 2012 ACM Subject Classification Theory of computation ! Design and analysis of algorithms.
Subject: Pattern mining
Property testing
Suffix tree
113 Computer and information sciences
Peer reviewed: Yes
Rights: cc_by
Usage restriction: openAccess
Self-archived version: publishedVersion


Files in this item

Total number of downloads: Loading...

Files Size Format View
LIPIcs_SEA_2020_12.pdf 2.589Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record