Cost-effective Resource Provisioning for Spark Workloads

Visa fullständig post



Permalänk

http://hdl.handle.net/10138/307832

Citation

Chen , Y , Lu , J , Chen , C , Hoque , M A & Tarkoma , S 2019 , Cost-effective Resource Provisioning for Spark Workloads . in CIKM '19 : Proceedings of the 28th ACM International Conference on Information and Knowledge Management . ACM , New York, NY , pp. 2477-2480 , ACM International Conference on Information and Knowledge Management , Beijing , China , 03/11/2019 . https://doi.org/10.1145/3357384.3358090

Titel: Cost-effective Resource Provisioning for Spark Workloads
Författare: Chen, Yuxing; Lu, Jiaheng; Chen, Chen; Hoque, Mohammad Ashraful; Tarkoma, Sasu
Upphovmannens organisation: Department of Computer Science
Unified DataBase Management System research group / Jiaheng Lu
Helsinki Institute for Information Technology
Content-Centric Structures and Networking research group / Sasu Tarkoma
Utgivare: ACM
Datum: 2019-11-03
Språk: eng
Sidantal: 4
Tillhör serie: CIKM '19
ISBN: 978-1-4503-6976-3
DOI: https://doi.org/10.1145/3357384.3358090
Permanenta länken (URI): http://hdl.handle.net/10138/307832
Abstrakt: Spark is one of the prevalent big data analytical platforms. Configuring proper resource provision for Spark jobs is challenging but essential for organizations to save time, achieve high resource utilization, and remain cost-effective. In this paper, we study the challenge of determining the proper parameter values that meet the performance requirements of workloads while minimizing both resource cost and resource utilization time. We propose a simulation-based cost model to predict the performance of jobs accurately. We achieve low-cost training by taking advantage of simulation framework, i.e., Monte Carlo (MC) simulation, which uses a small amount of data and resources to make a reliable prediction for larger datasets and clusters. The salient feature of our method is that it allows us to invest low training cost while obtaining an accurate prediction. Through experiments with six benchmark workloads, we demonstrate that the cost model yields less than 7% error on average prediction accuracy and the recommendation achieves up to 5x resource cost saving.
Subject: 113 Computer and information sciences
Referentgranskad: Ja
Licens: unspecified
Användningsbegränsning: openAccess
Parallelpublicerad version: acceptedVersion


Filer under denna titel

Totalt antal nerladdningar: Laddar...

Filer Storlek Format Granska
Performance_Pre ... ation_for_Apache_Spark.pdf 643.4Kb PDF Granska/Öppna

Detta dokument registreras i samling:

Visa fullständig post