Cost-effective Resource Provisioning for Spark Workloads

Show simple item record

dc.contributor.author Chen, Yuxing
dc.contributor.author Lu, Jiaheng
dc.contributor.author Chen, Chen
dc.contributor.author Hoque, Mohammad Ashraful
dc.contributor.author Tarkoma, Sasu
dc.date.accessioned 2019-12-03T14:17:01Z
dc.date.available 2019-12-03T14:17:01Z
dc.date.issued 2019-11-03
dc.identifier.citation Chen , Y , Lu , J , Chen , C , Hoque , M A & Tarkoma , S 2019 , Cost-effective Resource Provisioning for Spark Workloads . in CIKM '19 : Proceedings of the 28th ACM International Conference on Information and Knowledge Management . ACM , New York, NY , pp. 2477-2480 , ACM International Conference on Information and Knowledge Management , Beijing , China , 03/11/2019 . https://doi.org/10.1145/3357384.3358090
dc.identifier.citation conference
dc.identifier.other PURE: 127915274
dc.identifier.other PURE UUID: 238f2352-f424-4252-a8e1-783c244f4f63
dc.identifier.other ORCID: /0000-0003-2067-454X/work/65677402
dc.identifier.other ORCID: /0000-0002-6220-2535/work/68616534
dc.identifier.other WOS: 000539898202112
dc.identifier.uri http://hdl.handle.net/10138/307832
dc.description.abstract Spark is one of the prevalent big data analytical platforms. Configuring proper resource provision for Spark jobs is challenging but essential for organizations to save time, achieve high resource utilization, and remain cost-effective. In this paper, we study the challenge of determining the proper parameter values that meet the performance requirements of workloads while minimizing both resource cost and resource utilization time. We propose a simulation-based cost model to predict the performance of jobs accurately. We achieve low-cost training by taking advantage of simulation framework, i.e., Monte Carlo (MC) simulation, which uses a small amount of data and resources to make a reliable prediction for larger datasets and clusters. The salient feature of our method is that it allows us to invest low training cost while obtaining an accurate prediction. Through experiments with six benchmark workloads, we demonstrate that the cost model yields less than 7% error on average prediction accuracy and the recommendation achieves up to 5x resource cost saving. en
dc.format.extent 4
dc.language.iso eng
dc.publisher ACM
dc.relation.ispartof CIKM '19
dc.relation.isversionof 978-1-4503-6976-3
dc.rights unspecified
dc.rights.uri info:eu-repo/semantics/openAccess
dc.subject 113 Computer and information sciences
dc.title Cost-effective Resource Provisioning for Spark Workloads en
dc.type Conference contribution
dc.contributor.organization Department of Computer Science
dc.contributor.organization Unified DataBase Management System research group / Jiaheng Lu
dc.contributor.organization Helsinki Institute for Information Technology
dc.contributor.organization Content-Centric Structures and Networking research group / Sasu Tarkoma
dc.description.reviewstatus Peer reviewed
dc.relation.doi https://doi.org/10.1145/3357384.3358090
dc.rights.accesslevel openAccess
dc.type.version acceptedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
Performance_Pre ... ation_for_Apache_Spark.pdf 643.4Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record