Case study on the compression techniques of a column oriented database

Show full item record

Title: Case study on the compression techniques of a column oriented database
Author: Karikoski, Antti
Contributor: University of Helsinki, Faculty of Science
Publisher: Helsingin yliopisto
Date: 2019
Thesis level: master's thesis
Abstract: Data compression is one way to gain better performance from a database. Compression is typically achieved with a compression algorithm, an encoding or both. Effective compression directly lowers the physical storage requirements translating to reduced storage costs. Additionally, in case of a data transfer bottleneck where CPU is data starved, compression can yield improved query performance through increased transfer bandwidth and better CPU utilization. However, obtaining better query performance is not trivial since many factors affect the viability of compression. Compression has been found especially successful in column oriented databases where similar data is stored closely in physical media. This thesis studies the effect of compression on a columnar storage format Apache Parquet through a micro benchmark that is based on the TPC-H benchmark. Compression is found to have positive effects on simple queries. However, with complex queries, where data scanning is relatively small portion of the query, no performance gains were observed. Furthermore, this thesis examines the decoding performance of the encoding layer that belongs to a case database, Fastorm. The goal is to determine its efficiency among other encodings and whether it could be improved upon. Fastorm's encoding is compared against various encodings of Apache Parquet in a setting where data is from a real world business. Fastorm's encoding is deemed to perform well enough coupled with strong evidence to consider adding delta encoding to its repertoire of encoding techniques.
Discipline: Algorithms and Machine Learning

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show full item record