Geometric Approaches to Big Data Modeling and Performance Prediction

Show full item record

Title: Geometric Approaches to Big Data Modeling and Performance Prediction
Author: Goetsch, Peter
Contributor: University of Helsinki, Faculty of Science, Department of Computer Science
Publisher: Helsingin yliopisto
Date: 2018
Language: eng
Thesis level: master's thesis
Abstract: Big Data frameworks (e.g., Spark) have many configuration parameters, such as memory size, CPU allocation, and the number of nodes (parallelism). Regular users and even expert administrators struggle to understand the relationship between different parameter configurations and the overall performance of the system. In this work, we address this challenge by proposing a performance prediction framework to build performance models with varied configurable parameters on Spark. We take inspiration from the field of Computational Geometry to construct a d-dimensional mesh using Delaunay Triangulation over a selected set of features. From this mesh, we predict execution time for unknown feature configurations. To minimize the time and resources spent in building a model, we propose an adaptive sampling technique to allow us to collect as few training points as required. Our evaluation on a cluster of computers using several workloads shows that our prediction error is lower than the state-of-art methods while having fewer samples to train.

Files in this item

Total number of downloads: Loading...

Files Size Format View
Peter Goetsch Master's Thesis 6-June-2018.pdf 923.0Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record