such as MapReduce processing. It originally sup-
ported map and reduce processes (Dean and Ghe-
mawat, 2004). The first is invoked dividing a large
scale data into smaller sub-problems and assigning
them to worker nodes. Each worker node processed
the smaller sub-problems. The second involves col-
lecting the answers to all the sub-problems and ag-
gregates them as the answer to the original problem it
was trying to solve.
There have been many attempts to improve
Hadoop, which is an implementation of MapReduce
by Yahoo! in academic or commercial projects. How-
ever, there have been few attempts to implement
MapReduce itself except for Hadoop. For exam-
ple, the Phoenix system (Talbot et al., 2011) and the
MATE system (Jiang et al., 2010) supported multi-
ple core processors with shared memory. Also, sev-
eral researchers have focused on iteratively execut-
ing MapReduce efficiently, e.g., Twister (Ekanayake
et al., 2010), Haloop (Bu et al., 2010), MRAP
(Sehrish et al., 2010). These implementations assume
data in progress to be stored at temporal files rather
than key-valuestores in data nodes. They assume data
to be stored in high-performance servers for MapRe-
duce processing, instead of in the edges. These works
may be able to improve performance of iterative pro-
cessing the same data. However, our framework does
not aim at such a iterative processing. This is because
most data at sensor nodes or embedded computers are
processed only once or a few times. Suppose analyz-
ing of logs at network equipment. Only updated log
data are collected and analyzed every hour or day in-
stead of the data that were already analyzed.
A few researchers have proposed their original
MapReduce processing frameworks for embedded or
mobile computers. The Misco (Dou et al., 2010)
system was a framework for executing MapReduce
processing on mobile phones via HTTP. Elespuro et
al. developed a system for executing MapReduce us-
ing heterogeneous devices, e.g., smartphones, from
a mobile device client for iPhone (Elespuru et al.,
2009). These were aimed at executing data process-
ing at nodes, e.g., smart phones and embedded com-
puters, like ours. Our main differences from theirs
are that our framework intends to execute data pro-
cessing at computers that have the target data at ar-
bitrary stores of the computers rather than at smart
phones or embedded computers assigned as certain
work nodes. In the literature on sensor networks, the
IoT, and Machine-to-Machine (M2M), several aca-
demic or commercial projects have attempted to sup-
port data at nodes on IoT at sensor nodes and em-
bedded computers. For example, Cisco’s Flog Com-
puting (Bonomi et al., 2012) and EMC’s the IoTs in-
tend to integrate cloud computing over the Internet
and peripheral computers. However, most of them do
not support the aggregation of data generated and pro-
cessed at nodes. The author presented a MapReduce-
based framework for processing data in a previous
paper (Satoh, 2013), but the previous framework as-
sumed to be executed on clusters of high-performance
servers, rather than embedded computers.
3 REQUIREMENTS
Before explaining our system, let us discuss require-
ments.
• MapReduce processing and its clones, e.g.,
Hadoop, are one of the most popular data pro-
cessing framework. It should be available in IoT,
which generates a large amount of data from sen-
sor nodes.
• Networks in IoT tend to be wireless or low-band
wired, like industry-use networks. They have
non-neglectable communication latency and are
not robust in congestion. The transmission of such
data from nodes at the edge to server nodes seri-
ously affects performance in analyzing data and
results in congestion in networks.
• Modern computers on IoT have 32 bit processors
with small amounts memory, like Raspberry Pi
computers.
• In IoT, a lot of data are generated from sensors.
Nodes at IoT locally have their data inside their
storage, e.g., flash memory.
• Every node may be able to support management
and/or data processing tasks, but may not initially
have any codes for its tasks.
• Unlike other existing MapReduce implementa-
tions, including Hadoop, our framework should
not assume any special underlying systems.
1
There is no centralized management system in
IoT. Our framework should be available without
such a system.
Our framework assumes data can be processed with-
out exchanging data between nodes. In fact, in IoT
data that each node has is generated from the node’s
sensors so that the data in different nodes can be pro-
cessed independently of one another.
1
Hadoop has is been not available in Windows, because
it needs a permission mechanism peculiar to Unix and its
families.