Case study on the compression techniques of a column oriented database

Show simple item record

dc.contributor Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta fi
dc.contributor University of Helsinki, Faculty of Science en
dc.contributor Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten sv
dc.contributor.author Karikoski, Antti
dc.date.issued 2019
dc.identifier.uri URN:NBN:fi:hulib-201908133218
dc.identifier.uri http://hdl.handle.net/10138/304695
dc.description.abstract Data compression is one way to gain better performance from a database. Compression is typically achieved with a compression algorithm, an encoding or both. Effective compression directly lowers the physical storage requirements translating to reduced storage costs. Additionally, in case of a data transfer bottleneck where CPU is data starved, compression can yield improved query performance through increased transfer bandwidth and better CPU utilization. However, obtaining better query performance is not trivial since many factors affect the viability of compression. Compression has been found especially successful in column oriented databases where similar data is stored closely in physical media. This thesis studies the effect of compression on a columnar storage format Apache Parquet through a micro benchmark that is based on the TPC-H benchmark. Compression is found to have positive effects on simple queries. However, with complex queries, where data scanning is relatively small portion of the query, no performance gains were observed. Furthermore, this thesis examines the decoding performance of the encoding layer that belongs to a case database, Fastorm. The goal is to determine its efficiency among other encodings and whether it could be improved upon. Fastorm's encoding is compared against various encodings of Apache Parquet in a setting where data is from a real world business. Fastorm's encoding is deemed to perform well enough coupled with strong evidence to consider adding delta encoding to its repertoire of encoding techniques. en
dc.language.iso eng
dc.publisher Helsingin yliopisto fi
dc.publisher University of Helsinki en
dc.publisher Helsingfors universitet sv
dc.title Case study on the compression techniques of a column oriented database en
dc.type.ontasot pro gradu -tutkielmat fi
dc.type.ontasot master's thesis en
dc.type.ontasot pro gradu-avhandlingar sv
dc.subject.discipline Algorithms and Machine Learning und
dct.identifier.urn URN:NBN:fi:hulib-201908133218

Files in this item

Files Size Format View
Karikoski_Antti_Pro_gradu_2019.pdf 989.3Kb application/pdf View/Open

This item appears in the following Collection(s)

Show simple item record