Geometric Approaches to Big Data Modeling and Performance Prediction

Visa fullständig post

Titel: Geometric Approaches to Big Data Modeling and Performance Prediction
Författare: Goetsch, Peter
Medarbetare: Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap
Utgivare: Helsingin yliopisto
Datum: 2018
Språk: eng
Permanenta länken (URI):
Nivå: pro gradu-avhandlingar
Abstrakt: Big Data frameworks (e.g., Spark) have many configuration parameters, such as memory size, CPU allocation, and the number of nodes (parallelism). Regular users and even expert administrators struggle to understand the relationship between different parameter configurations and the overall performance of the system. In this work, we address this challenge by proposing a performance prediction framework to build performance models with varied configurable parameters on Spark. We take inspiration from the field of Computational Geometry to construct a d-dimensional mesh using Delaunay Triangulation over a selected set of features. From this mesh, we predict execution time for unknown feature configurations. To minimize the time and resources spent in building a model, we propose an adaptive sampling technique to allow us to collect as few training points as required. Our evaluation on a cluster of computers using several workloads shows that our prediction error is lower than the state-of-art methods while having fewer samples to train.

Filer under denna titel

Totalt antal nerladdningar: Laddar...

Filer Storlek Format Granska
Peter Goetsch Master's Thesis 6-June-2018.pdf 923.0Kb PDF Granska/Öppna

Detta dokument registreras i samling:

Visa fullständig post