platform. It integrates algorithms from both pre-
processing and dynamic SDM approaches. On the
one hand, algorithms from the Weka DM tool (Hall et
al., 2009) have been used after a pre-processing step
using the GDPM API (Bogorny et al., 2006). On the
other hand, a naïve regionalization algorithm and a
simple spatial rules association extraction algorithm
that can be directly applied on spatial data have been
implemented. While existing SDM tools show a lack
of visualisation especially for open source ones,
EasySDM offers the possibility to visualize spatial
data directly on an integrated geographical map
before and after applying DM algorithms.
Furthermore, a visualization is also possible via any
external Geographic Information System (GIS). Due
to its simplicity and visualization capabilities, we
believe that EasySDM may be helpful, inter alia, in
explaining SDM to students in the academic area. It
has been produced under the GPL licence in order to
allow researchers and programmers to access and
improve the source code. The platform setup, source
code and documentation are publically available on
the internet
1
.
The rest of the paper is organized as follow: First,
a comparative study on existing SDM tools is
presented in section 2. Then, EasySDM and its
components are detailed in section 3. After that, we
conduct some experiments using EasySDM in order
to illustrate its functionalities and present them in
section 4. Finally, section 5 concludes and gives our
main perspectives.
2 COMPARATIVE STUDY OF
SDM TOOLS
Many SDM tools have been proposed in the literature.
(Han et al., 1997) proposed GeoMiner, the first
knowledge extraction software from spatial
databases, developed in 1997. It is an extension of the
classical DM tool DBMiner (Jiawei Han, 1996)
developed by the same team in 1996. Similarly,
(Ouattara, 2010) developed GeoKnime, an extension
of the Knime software (www.knime.org) to spatial
data. (Appice et al., 2007) proposed Ingens, an
integrated platform for SDM within a GIS
environment. (Lazarevic et al., 2000) developed
SDAM, a software system for spatial data analysis
and modelling that includes two tasks of SDM
(clustering and classification). (May and Savinov,
2001) developed the SPIN system, a spatial
1
http://www.lirmm.fr/~abdaoui/EasySDM
information system that implements many clustering,
classification and association rule mining algorithms.
(Bogorny et al., 2006) developed a spatial pre-
processing API that can be added to the Weka
software in order to treat spatial data. Finally, an
interesting application of clustering, named
CrimeStat, has been proposed in (Levine and al,
2004) in order to detect hot spots of crime incidents.
In this section, we compare these tools according
to their general characteristics. Table 1 presents for
each tool: the year of its latest release, whether the
software and the source code are publically accessible
or not, whether a documentation is available or not
and, finally, the type of the proposed visualization (if
any).
Table 1 : General characteristics of existing SDM tools.
Tool name
Year of last
release
Tool public
accessibility
Sources public
accessibility
Documentation
integrated map
display
External map
display
Geo-
Miner
1999 No No No Yes No
Geo-
Knime
2010 No No No No No
Ingens 2007 No No No Yes No
SDAM 2000 No No No No No
SPIN 2003 Yes No Yes Yes No
GDPM 2007 Yes Yes Yes No No
Crime-
Stat
2010 Yes No Yes Yes Yes
Table 2 presents a comparison of these tools
according to their technical characteristics. For each
tool, it presents its architecture, the programming
language, whether it functions with all operating
systems, and the possible types of data input. Finally,
Table 3 presents a functional comparison, which
takes into consideration the used SDM approach, the
types of the considered spatial relations, and the
implemented SDM tasks.
It is important to notice that GeoMiner and Ingens
have been built on specific spatial query languages.
When they were released, these two tools were not
successful. Moreover, GeoKnime and SDAM are not
publically accessible and do not seem to be massively
used. Since we could not test these four tools, their
characteristics have been extracted from the scientific
papers describing them.