Title: | Chaining with Maximal Exact Matches for Fast and Accurate Approximation of Edit Distance |
Author: | Porttinen, Peter |
Other contributor: |
Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta
University of Helsinki, Faculty of Science Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten |
Publisher: | Helsingin yliopisto |
Date: | 2020 |
Language: | eng |
URI: |
http://urn.fi/URN:NBN:fi:hulib-202101191297
http://hdl.handle.net/10138/324896 |
Thesis level: | master's thesis |
Degree program: |
Tietojenkäsittelytieteen maisteriohjelma
Master's Programme in Computer Science Magisterprogrammet i datavetenskap |
Specialisation: |
Algoritmit
Algorithms Algoritmer |
Discipline: | none |
Abstract: | Computing an edit distance between strings is one of the central problems in both string processing and bioinformatics. Optimal solutions to edit distance are quadratic to the lengths of the input strings. The goal of this thesis is to study a new approach to approximate edit distance. We use a chaining algorithm presented by Mäkinen and Sahlin in "Chaining with overlaps revisited" CPM 2020 implemented verbatim. Building on the chaining algorithm, our focus is on efficiently finding a good set of anchors for the chaining algorithm. We present three approaches to computing the anchors as maximal exact matches: Bi-Directional Burrows-Wheeler Transform, Minimizers, and lastly, a hybrid implementation of the two. Using the maximal exact matches as anchors, we can efficiently compute an optimal chaining alignment for the strings. The chaining alignment further allows us to determine all such intervals where mismatches occur by looking at which sequences are not in the chain. Using these smaller intervals lets us approximate edit distance with a high degree of accuracy and a significant speed improvement. The methods described present a way to approximate edit distance in time complexity bounded by the number of maximal exact matches. |
Subject: |
Edit Distance
Maximal Exact Matches Bidirectional Burrows-Wheeler Transform K-mer Chaining |
Total number of downloads: Loading...
Files | Size | Format | View |
---|---|---|---|
Porttinen_Peter_tutkielma_2020.pdf | 448.1Kb |
View/ |