Analysis and experimental evaluation of an approximation algorithm for the length of an optimal Lempel-Ziv parsing

Show simple item record

dc.contributor Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta fi
dc.contributor University of Helsinki, Faculty of Science en
dc.contributor Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten sv
dc.contributor.author Nietosvaara, Joonas
dc.date.issued 2019
dc.identifier.uri URN:NBN:fi:hulib-201908133201
dc.identifier.uri http://hdl.handle.net/10138/304699
dc.description.abstract We examine a previously known sublinear-time algorithm for approximating the length of a string’s optimal (i.e. shortest) Lempel-Ziv parsing (a.k.a. LZ77 factorization). This length is a measure of compressibility under the LZ77 compression algorithm, so the algorithm also estimates a string’s compressibility. The algorithm’s approximation approach is based on a connection between optimal Lempel-Ziv parsing length and the number of distinct substrings of different lengths in a string. Some aspects of the algorithm are described more explicitly than in earlier work, including the constraints on its input and how to distinguish between strings with short vs. long optimal parsings in sublinear time; several proofs (and pseudocode listings) are also more detailed than in earlier work. An implementation of the algorithm is provided. We experimentally investigate the algorithm’s practical usefulness for estimating the compressibility of large collections of data. The algorithm is run on real-world data under a wide range of approximation parameter settings. The accuracy of the resulting estimates is evaluated. The estimates turn out to be consistently highly inaccurate, albeit always inside the stated probabilistic error bounds. We conclude that the algorithm is not promising as a practical tool for estimating compressibility. We also examine the empirical connection between optimal parsing length and the number of distinct substrings of different lengths. The latter turns out to be a suprisingly accurate predictor of the former within our test data, which suggests avenues for future work. en
dc.language.iso eng
dc.publisher Helsingin yliopisto fi
dc.publisher University of Helsinki en
dc.publisher Helsingfors universitet sv
dc.title Analysis and experimental evaluation of an approximation algorithm for the length of an optimal Lempel-Ziv parsing en
dc.type.ontasot pro gradu -tutkielmat fi
dc.type.ontasot master's thesis en
dc.type.ontasot pro gradu-avhandlingar sv
dc.subject.discipline Tietojenkäsittelytiede und
dct.identifier.urn URN:NBN:fi:hulib-201908133201

Files in this item

Total number of downloads: Loading...

Files Size Format View
Nietosvaara_Joonas_Pro_gradu_2019.pdf 760.2Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record