Case study on the compression techniques of a column oriented database

Näytä kaikki kuvailutiedot

Julkaisun nimi: Case study on the compression techniques of a column oriented database
Tekijä: Karikoski, Antti
Muu tekijä: Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta
Julkaisija: Helsingin yliopisto
Päiväys: 2019
Kieli: eng
Opinnäytteen taso: pro gradu -tutkielmat
Oppiaine: Algorithms and Machine Learning
Tiivistelmä: Data compression is one way to gain better performance from a database. Compression is typically achieved with a compression algorithm, an encoding or both. Effective compression directly lowers the physical storage requirements translating to reduced storage costs. Additionally, in case of a data transfer bottleneck where CPU is data starved, compression can yield improved query performance through increased transfer bandwidth and better CPU utilization. However, obtaining better query performance is not trivial since many factors affect the viability of compression. Compression has been found especially successful in column oriented databases where similar data is stored closely in physical media. This thesis studies the effect of compression on a columnar storage format Apache Parquet through a micro benchmark that is based on the TPC-H benchmark. Compression is found to have positive effects on simple queries. However, with complex queries, where data scanning is relatively small portion of the query, no performance gains were observed. Furthermore, this thesis examines the decoding performance of the encoding layer that belongs to a case database, Fastorm. The goal is to determine its efficiency among other encodings and whether it could be improved upon. Fastorm's encoding is compared against various encodings of Apache Parquet in a setting where data is from a real world business. Fastorm's encoding is deemed to perform well enough coupled with strong evidence to consider adding delta encoding to its repertoire of encoding techniques.


Tiedosto(t) Koko Formaatti Näytä

Tähän julkaisuun ei ole liitetty tiedostoja

Viite kuuluu kokoelmiin:

Näytä kaikki kuvailutiedot