Chaining with Maximal Exact Matches for Fast and Accurate Approximation of Edit Distance

Show full item record



Permalink

http://urn.fi/URN:NBN:fi:hulib-202101191297
Title: Chaining with Maximal Exact Matches for Fast and Accurate Approximation of Edit Distance
Author: Porttinen, Peter
Contributor: University of Helsinki, Faculty of Science
Publisher: Helsingin yliopisto
Date: 2020
Language: eng
URI: http://urn.fi/URN:NBN:fi:hulib-202101191297
http://hdl.handle.net/10138/324896
Thesis level: master's thesis
Degree program: Tietojenkäsittelytieteen maisteriohjelma
Master's Programme in Computer Science
Magisterprogrammet i datavetenskap
Specialisation: Algoritmit
Algorithms
Algoritmer
Discipline: none
Abstract: Computing an edit distance between strings is one of the central problems in both string processing and bioinformatics. Optimal solutions to edit distance are quadratic to the lengths of the input strings. The goal of this thesis is to study a new approach to approximate edit distance. We use a chaining algorithm presented by Mäkinen and Sahlin in "Chaining with overlaps revisited" CPM 2020 implemented verbatim. Building on the chaining algorithm, our focus is on efficiently finding a good set of anchors for the chaining algorithm. We present three approaches to computing the anchors as maximal exact matches: Bi-Directional Burrows-Wheeler Transform, Minimizers, and lastly, a hybrid implementation of the two. Using the maximal exact matches as anchors, we can efficiently compute an optimal chaining alignment for the strings. The chaining alignment further allows us to determine all such intervals where mismatches occur by looking at which sequences are not in the chain. Using these smaller intervals lets us approximate edit distance with a high degree of accuracy and a significant speed improvement. The methods described present a way to approximate edit distance in time complexity bounded by the number of maximal exact matches.
Subject: Edit Distance
Maximal Exact Matches
Bidirectional Burrows-Wheeler Transform
K-mer
Chaining


Files in this item

Total number of downloads: Loading...

Files Size Format View
Porttinen_Peter_tutkielma_2020.pdf 448.1Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record