Enhancing VLAM Workflow Model with MapReduce Operations

Mikolaj Baranowski, Adam Belloum, Marian Bubak

Abstract

MapReduce frameworks proved to be a good solution for storing and processing large amounts of data. Thanks to data parallelism, they allow to move computations very close to the storage and therefore to reduce an influence of “I/O bottleneck”. Workflow Management Systems, in turn, are widely used for modeling of scientific applications. Users that are willing to use MapReduce frameworks in their workflows have to run separate environment to develop Map/Reduce operations. In this paper we propose an approach that will allow to extend existing application models by MapReduce routines. Our solution bases on DSL constructed on top of Ruby programming language. It follows examples of Sawzall and Pig Latin languages and allows to define Map/Reduce operations in minimalist way. Moreover, because the language is based on Ruby, the model allows to use user defined routines and existing Ruby libraries. A particular model of the workflow management system can be extended with our DSL letting users to use one environment for developing the workflow and MapReduce application.

References

  1. Baranowski, M., Belloum, A., and Bubak, M. (2013a). Defining and running mapreduce operations with wsvlam workflow management system. In ICCS.
  2. Baranowski, M., Belloum, A., Bubak, M., and Malawski, M. (2013b). Constructing workflows from script applications. to be published in Scientific Programming.
  3. Belloum, A., Inda, M., Vasunin, D., Korkhov, V., Zhao, Z., Rauwerda, H., Breit, T., Bubak, M., and Hertzberger, L. (2011). Collaborative e-science experiments and scientific workflows. Internet Computing, IEEE, 15(4):39-47.
  4. Chen, Q., Wang, L., and Shang, Z. (2008). Mrgis: A mapreduce-enabled high performance workflow system for gis. In eScience, 2008. eScience 7808. IEEE Fourth International Conference on, pages 646 -651.
  5. Cushing, R., Koulouzis, S., Belloum, A., and Bubak, M. (2011). Prediction-based auto-scaling of scientific workflows. In Proceedings of the 9th International Workshop on Middleware for Grids, Clouds and eScience, page 1. ACM.
  6. Dean, J. and Ghemawat, S. (2008). Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107-113.
  7. Ford, N. (2013). Functional thinking: Why functional programming is on the rise. Technical report, IBM.
  8. Goble, C. and Roure, D. D. (2009). The impact of workflow tools on data-centric research. In Data Intensive Computing: The Fourth Paradigm of Scientific Discovery.
  9. Hey, A., Tansley, S., and Tolle, K. (2009). The fourth paradigm: data-intensive scientific discovery. Microsoft Research Redmond, WA.
  10. Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M. B., Lee, E. A., Tao, J., and Zhao, Y. (2006). Scientific workflow management and the kepler system. Concurrency and Computation: Practice and Experience, 18(10):1039-1065.
  11. Odersky, M., Spoon, L., and Venners, B. (2010). Programming in Scala, second edition. Artima Series. Artima, Incorporated.
  12. Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. (2008). Pig latin: a not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, SIGMOD 7808, pages 1099-1110, New York, NY, USA. ACM.
  13. Pike, R., Dorward, S., Griesemer, R., and Quinlan, S. (2005). Interpreting the data: Parallel analysis with sawzall. Scientific Programming Journal, 13:277- 298.
  14. Thain, D. and Moretti, C. (2010). Abstractions for Cloud Computing with Condor, pages 153-171. CRC Press.
  15. Wang, J., Crawl, D., and Altintas, I. (2009). Kepler + hadoop: a general architecture facilitating dataintensive applications in scientific workflow systems. In Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS 7809, pages 12:1-12:8, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

Baranowski M., Belloum A. and Bubak M. (2013). Enhancing VLAM Workflow Model with MapReduce Operations . In Proceedings of the 3rd International Conference on Simulation and Modeling Methodologies, Technologies and Applications - Volume 1: SIMULTECH, ISBN 978-989-8565-69-3, pages 179-185. DOI: 10.5220/0004488401790185


in Bibtex Style

@conference{simultech13,
author={Mikolaj Baranowski and Adam Belloum and Marian Bubak},
title={Enhancing VLAM Workflow Model with MapReduce Operations},
booktitle={Proceedings of the 3rd International Conference on Simulation and Modeling Methodologies, Technologies and Applications - Volume 1: SIMULTECH,},
year={2013},
pages={179-185},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004488401790185},
isbn={978-989-8565-69-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Simulation and Modeling Methodologies, Technologies and Applications - Volume 1: SIMULTECH,
TI - Enhancing VLAM Workflow Model with MapReduce Operations
SN - 978-989-8565-69-3
AU - Baranowski M.
AU - Belloum A.
AU - Bubak M.
PY - 2013
SP - 179
EP - 185
DO - 10.5220/0004488401790185