Mäkinen, Sasu
(Helsingin yliopisto, 2021)
Deploying machine learning models is found to be a massive issue in the field. DevOps and
Continuous Integration and Continuous Delivery (CI/CD) has proven to streamline and accelerate deployments in the field of software development. Creating CI/CD pipelines in software
that includes elements of Machine Learning (MLOps) has unique problems, and trail-blazers in
the field solve them with the use of proprietary tooling, often offered by cloud providers.
In this thesis, we describe the elements of MLOps. We study what the requirements to automate
the CI/CD of Machine Learning systems in the MLOps methodology. We study if it is feasible
to create a state-of-the-art MLOps pipeline with existing open-source and cloud-native tooling
in a cloud provider agnostic way.
We designed an extendable and cloud-native pipeline covering most of the CI/CD needs of
Machine Learning system. We motivated why Machine Learning systems should be included
in the DevOps methodology. We studied what unique challenges machine learning brings to
CI/CD pipelines, production environments and monitoring. We analyzed the pipeline’s design,
architecture, and implementation details and its applicability and value to Machine Learning
projects.
We evaluate our solution as a promising MLOps pipeline, that manages to solve many issues
of automating a reproducible Machine Learning project and its delivery to production. We
designed it as a fully open-source solution that is relatively cloud provider agnostic. Configuring
the pipeline to fit the client needs uses easy-to-use declarative configuration languages (YAML,
JSON) that require minimal learning overhead.