In this paper we
• explain why there is a need for a generic mobile
data mining framework
• discuss why the existing Smart Archive serves as
a good basis for a mobile data mining framework
• show how the framework is modified so that it fits
into mobile systems and can reliably mine data
from real-time sensors
The paper is structured as follows: section 2 re-
views previous work related to MSA and describes
the motivation for creating a mobile data mining
framework. Section 3 discusses why the Smart
Archive framework servers a good basis for a mobile
data mining framework and goes into the details of the
internal design of the framework, especially related to
the original framework. Section 4 then demonstrates
the framework with a small example program. The
conclusions are discussed in section 5.
2 MOTIVATION AND RELATED
WORK
The development of MSA originated from a specific
need of our research group to apply data mining meth-
ods to time series data recorded from human move-
ments. They are usually tracked with several differ-
ent wearable sensors, which may be, for example, ac-
celerometers, magnetometers, gyroscopes and other
similar time series data-producing devices. Usually
the sensors and the software used to collect the data
havebeen proprietaryand tied closely together,mean-
ing that specific sensors can be used only with certain
software within a certain project.
Furthermore, the ways to store the data are nu-
merous: the data can be stored in the flash memory
of the sensor, in a data file or in a database. In ad-
dition, the storage site can reside in a local computer
or in a network. Only the first option allows data to
be recorded without environmental limitations. Oth-
erwise the sensors need a connection to the computer
that records the data, which means data recording is
restricted to the confines of the computer.
This has led to a situation where the data have to
be collected using different sensors, processed so that
they are uniform and transferred to a place where that
are available to the actual data mining software before
one can even think of starting the data mining process.
The process is, at best, slow. With several different
sensors and storage sites, it is also error-prone.
MSA speeds up the development of mobile data
mining applications where data stream-producing
sensors are used. The only two things the application
developer needs to do are code an interface between
the framework and the sensor(s) and define how the
data are processed in the filters of the application.
Everything else is handled by the framework. Even
though the framework has been implemented with the
data stream sensors in mind, the input to the appli-
cation can be any kind of device or storage site that
produces time series data.
Although data mining in a mobile environment
is an emerging field of research, it appears that no
research has been done regarding mobile data min-
ing frameworks. However, there are some inter-
esting applications for mobile systems which intro-
duce different application areas for mobile data min-
ing. For example, MobiMine (Kargupta et al., 2002)
is client-server-based software for monitoring time-
critical financial data from a handheld PDA. (Wang
et al., 2003) proposes a distributed and mobile data
mining system in which algorithms are encapsulated
into SQL Server-stored procedures. An experimen-
tal mobile and distributed data stream mining system
that allows real-time vehicle health monitoring and
drivercharacterization is presented in (Karguptaet al.,
2004).
Of course, existing non-mobile component-based
data mining application frameworks such as the orig-
inal Smart Archive (Laurinen et al., 2005), D2K
(NCSA Automated Learning Group, 2003), Knime
(Berthold et al., 2006) and YALE (Mierswa et al.,
2006) can be modified to receive and mine real-time
sensor data, but they do not work very well in mobile
systems. First of all, they all are written in Java and
therefore need Java Virtual Machine (JVM) to run.
Many embedded and mobile systems are not capable
of running JVM very well or it may not exist for these
systems at all. Secondly, some of the aforementioned
frameworks apply a graphical user interface (GUI) to
program and visualize the relationships between the
data mining application components, pipes and filters.
The compatibility of the GUIs with different mobile
systems, which can have output screens in various
sizes, is questionable. Finally, if needed, MSA al-
lows modifications anywhere in the code, not just in
the component API of the framework.
3 DESIGN OF THE
FRAMEWORK
The purpose of the MSA framework is to serve as a
core for different data stream mining applications in
mobile systems. To maximize the portability of the
framework, it has been written using standard C++
as much as possible. Since it is very portable, the
COMPONENT-BASED FRAMEWORK FOR MOBILE DATA MINING WITH SUPPORT FOR REAL-TIME
SENSORS
209