Creek: Leveraging Serverless for Online Machine Learning on Streaming Data

Nazmul Takbir, Tahmeed Tarek, Muhammad Adnan

2024

Abstract

Recently, researchers have seen promising results in using serverless computing for real-time machine learning inference tasks. Several researchers have also used serverless for machine learning training and compared it against VM-based (virtual machine) training. However, most of these approaches, which assumed traditional offline machine learning, did not find serverless to be particularly useful for model training. In our work, we take a different approach; we explore online machine learning. The incremental nature of training online machine learning models allows better utilization of the elastic scaling and consumption-based pricing offered by serverless. Hence, we introduce Creek, a proof-of-concept system for training online machine learning models on streaming data using serverless. We explore architectural variants of Creek on AWS and compare them in terms of monetary cost and training latency. We also compare Creek against VM-based training and identify the factors influencing the choice between a serverless and VM-based solution. We explore model parallelism and introduce a usage-based dynamic memory allocation of serverless functions to reduce costs. Our results indicate that serverless training is cheaper than VM-based training when the streaming rate is sporadic and unpredictable. Furthermore, parallel training using serverless can significantly reduce training latency for models with low communication overhead.

Download


Paper Citation


in Harvard Style

Takbir N., Tarek T. and Adnan M. (2024). Creek: Leveraging Serverless for Online Machine Learning on Streaming Data. In Proceedings of the 14th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER; ISBN 978-989-758-701-6, SciTePress, pages 38-49. DOI: 10.5220/0012619100003711


in Bibtex Style

@conference{closer24,
author={Nazmul Takbir and Tahmeed Tarek and Muhammad Adnan},
title={Creek: Leveraging Serverless for Online Machine Learning on Streaming Data},
booktitle={Proceedings of the 14th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER},
year={2024},
pages={38-49},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012619100003711},
isbn={978-989-758-701-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER
TI - Creek: Leveraging Serverless for Online Machine Learning on Streaming Data
SN - 978-989-758-701-6
AU - Takbir N.
AU - Tarek T.
AU - Adnan M.
PY - 2024
SP - 38
EP - 49
DO - 10.5220/0012619100003711
PB - SciTePress