Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds

Show full item record



Permalink

http://hdl.handle.net/10138/332281

Citation

Šimko , T , Heinrich , L A , Lange , C , Lintuluoto , A E , MacDonell , D M , Mečionis , A , Rodriguez , D R & Shandilya , P 2021 , ' Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds ' , Frontiers in Big Data , vol. 4 , 661501 . https://doi.org/10.3389/fdata.2021.661501

Title: Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds
Author: Šimko, Tibor; Heinrich, Lukas Alexander; Lange, Clemens; Lintuluoto, Adelina Eleonora; MacDonell, Danika Marina; Mečionis, Audrius; Rodriguez, Diego Rodriguez; Shandilya, Parth
Contributor organization: Department of Physics
Date: 2021-05-07
Language: eng
Number of pages: 12
Belongs to series: Frontiers in Big Data
ISSN: 2624-909X
DOI: https://doi.org/10.3389/fdata.2021.661501
URI: http://hdl.handle.net/10138/332281
Abstract: We describe a novel approach for experimental High-Energy Physics (HEP) data analyses that is centred around the declarative rather than imperative paradigm when describing analysis computational tasks. The analysis process can be structured in the form of a Directed Acyclic Graph (DAG), where each graph vertex represents a unit of computation with its inputs and outputs, and the graph edges describe the interconnection of various computational steps. We have developed REANA, a platform for reproducible data analyses, that supports several such DAG workflow specifications. The REANA platform parses the analysis workflow and dispatches its computational steps to various supported computing backends (Kubernetes, HTCondor, Slurm). The focus on declarative rather than imperative programming enables researchers to concentrate on the problem domain at hand without having to think about implementation details such as scalable job orchestration. The declarative programming approach is further exemplified by a multi-level job cascading paradigm that was implemented in the Yadage workflow specification language. We present two recent LHC particle physics analyses, ATLAS searches for dark matter and CMS jet energy correction pipelines, where the declarative approach was successfully applied. We argue that the declarative approach to data analyses, combined with recent advancements in container technology, facilitates the portability of computational data analyses to various compute backends, enhancing the reproducibility and the knowledge preservation behind particle physics data analyses.
Subject: 114 Physical sciences
computational workflows
reproducibility
scalability
declarative programming
analysis preservation
computational workflows
reproducibility
scalability
declarative programming
analysis preservation
Peer reviewed: Yes
Rights: cc_by
Usage restriction: openAccess
Self-archived version: publishedVersion


Files in this item

Total number of downloads: Loading...

Files Size Format View
fdata_04_661501.pdf 2.449Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record