Case study on the compression techniques of a column oriented database

Visa fullständig post



Permalänk

http://urn.fi/URN:NBN:fi:hulib-201908133218
Titel: Case study on the compression techniques of a column oriented database
Författare: Karikoski, Antti
Medarbetare: Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten
Utgivare: Helsingin yliopisto
Datum: 2019
Språk: eng
Permanenta länken (URI): http://urn.fi/URN:NBN:fi:hulib-201908133218
http://hdl.handle.net/10138/304695
Nivå: pro gradu-avhandlingar
Ämne: Algorithms and Machine Learning
Abstrakt: Data compression is one way to gain better performance from a database. Compression is typically achieved with a compression algorithm, an encoding or both. Effective compression directly lowers the physical storage requirements translating to reduced storage costs. Additionally, in case of a data transfer bottleneck where CPU is data starved, compression can yield improved query performance through increased transfer bandwidth and better CPU utilization. However, obtaining better query performance is not trivial since many factors affect the viability of compression. Compression has been found especially successful in column oriented databases where similar data is stored closely in physical media. This thesis studies the effect of compression on a columnar storage format Apache Parquet through a micro benchmark that is based on the TPC-H benchmark. Compression is found to have positive effects on simple queries. However, with complex queries, where data scanning is relatively small portion of the query, no performance gains were observed. Furthermore, this thesis examines the decoding performance of the encoding layer that belongs to a case database, Fastorm. The goal is to determine its efficiency among other encodings and whether it could be improved upon. Fastorm's encoding is compared against various encodings of Apache Parquet in a setting where data is from a real world business. Fastorm's encoding is deemed to perform well enough coupled with strong evidence to consider adding delta encoding to its repertoire of encoding techniques.


Filer under denna titel

Filer Storlek Format Granska

There are no files associated with this item.

Detta dokument registreras i samling:

Visa fullständig post