Listing 2: Map operation of word counter application.
ma p do |c , v|
re s = []
v . s pli t . eac h do | i |
re s << [c. s tri ng i, c . n umb er (1 ) ]
end
re s
end
red uce (: su m )
convenient Domain Specific Languages. However, to
prove the elaborated solutions to be easy and efficient,
more complex applications have to be investigated in
the future and execution times has to be measured and
compared with native Hadoop implementation.
Proposed solution can be extended into more
generic form – as a pluggable application that can be
used to extend models of other applications. In or-
der to provide such a functionality, targeting applica-
tion should implement a set of routines that will co-
ordinate MapReduce tasks from the execution engine
specific to particular application. The elaborated DSL
is not more complex than existing MapReduce DSLs
such as Sawzall and Pig Latin. It lets user to define a
Map operation in a convenient way without resigning
from such features as user-defined functions or Ruby
libraries. Developed DSL does not require other li-
braries but standard Ruby distribution, if we add that
there is an implementation of Ruby for Java Virtual
Machine (JRuby), we can conclude that created ap-
plication can be reasonably easily adopted to many
existing solutions as a separate module run in a Ruby
process or in the existing JVM instance. It can be also
considered that the proposed solution can be merged
with an existing DSL for the other domain.
In future work, other programming languages can
be considered as an alternative to Ruby. All the lan-
guages that have features required in metaprogram-
ming such as macro instructions, templates or that are
modifiable in runtime, can be considered. Special at-
tention should be paid to statically typed languages
based on Java Virtual Machine platform such as Scala
programming language (Odersky et al., 2010). These
modern languages can provide good constructs for
metaprogramming approach and at the same time,
they can directly use Java type system to allow bet-
ter integration with Hadoop.
Metaprogramming approach can be also consid-
ered to describe other features of Workflow Manage-
ment Systems. It can be used to enrich workflow
models with a configuration of resources or security
policies.
ACKNOWLEDGEMENTS
This work was partially supported by the Dutch na-
tional program COMMITand KI IET AGH grant. We
would like to thank Reginald Cushing and Spiros
Koulouzis from University of Amsterdam for discus-
sions and suggestions.
REFERENCES
Baranowski, M., Belloum, A., and Bubak, M. (2013a).
Defining and running mapreduce operations with ws-
vlam workflow management system. In ICCS.
Baranowski, M., Belloum, A., Bubak, M., and Malawski,
M. (2013b). Constructing workflows from script ap-
plications. to be published in Scientific Programming.
Belloum, A., Inda, M., Vasunin, D., Korkhov, V., Zhao, Z.,
Rauwerda, H., Breit, T., Bubak, M., and Hertzberger,
L. (2011). Collaborative e-science experiments and
scientific workflows. Internet Computing, IEEE,
15(4):39–47.
Chen, Q., Wang, L., and Shang, Z. (2008). Mrgis: A
mapreduce-enabled high performance workflow sys-
tem for gis. In eScience, 2008. eScience ’08. IEEE
Fourth International Conference on, pages 646 –651.
Cushing, R., Koulouzis, S., Belloum, A., and Bubak, M.
(2011). Prediction-based auto-scaling of scientific
workflows. In Proceedings of the 9th International
Workshop on Middleware for Grids, Clouds and e-
Science, page 1. ACM.
Dean, J. and Ghemawat, S. (2008). Mapreduce: simpli-
fied data processing on large clusters. Commun. ACM,
51(1):107–113.
Ford, N. (2013). Functional thinking: Why functional pro-
gramming is on the rise. Technical report, IBM.
Goble, C. and Roure, D. D. (2009). The impact of workflow
tools on data-centric research. In Data Intensive Com-
puting: The Fourth Paradigm of Scientific Discovery.
Hey, A., Tansley, S., and Tolle, K. (2009). The fourth
paradigm: data-intensive scientific discovery. Mi-
crosoft Research Redmond, WA.
Lud
¨
ascher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger,
E., Jones, M. B., Lee, E. A., Tao, J., and Zhao, Y.
(2006). Scientific workflow management and the ke-
pler system. Concurrency and Computation: Practice
and Experience, 18(10):1039–1065.
Odersky, M., Spoon, L., and Venners, B. (2010). Program-
ming in Scala, second edition. Artima Series. Artima,
Incorporated.
Olston, C., Reed, B., Srivastava, U., Kumar, R., and
Tomkins, A. (2008). Pig latin: a not-so-foreign lan-
guage for data processing. In Proceedings of the 2008
ACM SIGMOD international conference on Manage-
ment of data, SIGMOD ’08, pages 1099–1110, New
York, NY, USA. ACM.
Pike, R., Dorward, S., Griesemer, R., and Quinlan, S.
(2005). Interpreting the data: Parallel analysis with
SIMULTECH2013-3rdInternationalConferenceonSimulationandModelingMethodologies,Technologiesand
Applications
184