Reconfigurable and Adaptive Spark Applications

Mohamad Jaber, Mohamed Nassar, Wael Al Rahal Al Orabi, Bilal Abi Farraj, Mohamad Omar Kayali, Chadi Helwe

2017

Abstract

The contribution of this paper is two-fold. First, we propose a Domain Specific Language (DSL) to easily reconfigure and compose Spark applications. For each Spark application we define its input and output interfaces. Then, given a set of connections that map outputs of some Spark applications to free inputs of other Spark applications, we automatically embed Spark applications with the required synchronization and communication to properly run them according to the user-defined mapping. Second, we present an adaptive quality management/selection method for Spark applications. The method takes as input a pipeline of parameterized Spark applications, where the execution time of each Spark application is an unknown increasing function of quality level parameters. The method builds a controller that automatically computes adequate quality for each Spark application to meet a user-defined deadline. Consequently, users can submit a pipeline of Spark applications and a deadline, our method automatically runs all the Spark applications with the maximum quality while respecting the deadline specified by the user. We present experimental results showing the effectiveness of our method.

References

  1. Buttazzo, G. C., Lipari, G., and Abeni, L. (1998). Elastic task model for adaptive rate control. RTSS, pages 286- 295.
  2. Cascading (2016). Cascading. http://www.cascading.org.
  3. Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R. R., Bradshaw, R., and Weizenbaum, N. (2010). Flumejava: easy, efficient data-parallel pipelines. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2010, Toronto, Ontario, Canada, June 5- 10, 2010, pages 363-375.
  4. Combaz, J., Fernandez, J., Sifakis, J., and Strus, L. (2008). Symbolic quality control for multimedia applications. Real-Time Systems, 40(1):1-43.
  5. Combaz, J., Fernandez, J.-C., Lepley, T., and Sifakis, J. (2005b). Qos control for optimality and safety. Proceedings of the 5th Conference on Embedded Software.
  6. Dean, J. and Ghemawat, S. (2004). Mapreduce: Simplified data processing on large clusters. In 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA, December 6- 8, 2004, pages 137-150.
  7. Dean, J. and Ghemawat, S. (2010). Mapreduce: a flexible data processing tool. Commun. ACM, 53(1):72-77.
  8. Ghemawat, S., Gobioff, H., and Leung, S. (2003). The google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003, SOSP 2003, Bolton Landing, NY, USA, October 19- 22, 2003, pages 29-43.
  9. Giraph (2016). Apache giraph. http://giraph.apache.org.
  10. Gounaris, A., Kougka, G., Tous, R., Tripiana, C., and Torres, J. (2017). Dynamic configuration of partitioning Mahout (2016). Apache mahout. http://mahout.apache.org.
  11. Malewicz, G., Austern, M. H., Bik, A. J. C., Dehnert, J. C., Horn, I., Leiser, N., and Czajkowski, G. (2010). Pregel: a system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pages 135-146.
  12. Meng, X., Bradley, J. K., Yavuz, B., Sparks, E. R., Venkataraman, S., Liu, D., Freeman, J., Tsai, D. B., Amde, M., Owen, S., Xin, D., Xin, R., Franklin, M. J., Zadeh, R., Zaharia, M., and Talwalkar, A. (2015). Mllib: Machine learning in apache spark. CoRR, abs/1505.06807.
  13. R.I.Davis, K.W.Tindell, and A.Burns (1993). Scheduling slack time in fixed priority pre-emptive systems. Proceeding of the IEEE Real-Time Systems Symposium, pages 222-231.
  14. Valiant, L. G. (2011). A bridging model for multi-core computing. J. Comput. Syst. Sci., 77(1):154-166.
  15. Wust, C. C., Steffens, L., Verhaegh, W. F., Bril, R. J., and Hentschel, C. (2004). Qos control strategies for highquality video processing. Euromicro Conference on Real-Time Systems, pages 3-12.
  16. Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. (2010). Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud'10, Boston, MA, USA, June 22, 2010.
Download


Paper Citation


in Harvard Style

Jaber M., Nassar M., Al Rahal Al Orabi W., Abi Farraj B., Kayali M. and Helwe C. (2017). Reconfigurable and Adaptive Spark Applications . In Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-758-243-1, pages 112-119. DOI: 10.5220/0006289901120119


in Bibtex Style

@conference{closer17,
author={Mohamad Jaber and Mohamed Nassar and Wael Al Rahal Al Orabi and Bilal Abi Farraj and Mohamad Omar Kayali and Chadi Helwe},
title={Reconfigurable and Adaptive Spark Applications},
booktitle={Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2017},
pages={112-119},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006289901120119},
isbn={978-989-758-243-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - Reconfigurable and Adaptive Spark Applications
SN - 978-989-758-243-1
AU - Jaber M.
AU - Nassar M.
AU - Al Rahal Al Orabi W.
AU - Abi Farraj B.
AU - Kayali M.
AU - Helwe C.
PY - 2017
SP - 112
EP - 119
DO - 10.5220/0006289901120119