SELF-ORGANIZING MAPS
An Approach Applied to the Electronic Government
Everton Luiz de Almeida Gago Júnior, Gean Davis Breda, Eduardo Zanoni Marques and
Leonardo de Souza Mendes
School of Electrical Engineering, University of Campinas, Campinas, SP, Brazil
Keywords: Self-organizing Maps, Data Mining, e-Gov.
Abstract: With the facilitations and results offered by automated management systems, more and more municipalities
seek to eliminate physical documents by digitally storing their information. One of the direct consequences
of that is the generation of a large volume of data. This paper proposes a model to support decision making
based on self-organized maps. Applied to electronic government tools, this model can help identify
unknown data patterns guiding the decision making process. For the accomplishment of the case study,
available information from the city of Campinas, São Paulo, Brazil has been provided.
1 INTRODUCTION
Information and Communication Technology (ICT)
is recognized by its potential of interactivity with
users and service providers. The ICT have been
broadly used in the public sector aiming to help
administrators manage the resources and make
feasible the monitoring of results on the
implementation of public policies in the society
(Mourady and Elragal, 2011). The integration and
application of ICT technologies to the area of public
administrators is usually known as electronic
government (e-Gov) (Klischewski, 2003).
The large amount of data produced by ICT
technologies might be an issue for public
administrators whose intend to make decisions using
information from management systems, since these
data may be incomplete or also presented in an
incomprehensible way. A last aggravating is the
utilization of different databases for each agency in
the public sector. Most of the times, these bases are
built up on distinctive platforms, threatening
information exchange. Hence, public organizations
demand software solutions which can help the
identification of possible flaws and business
opportunities from smart analyses of their
operational data (Yan and Guo, 2010).
Many procedures have been done in order to
provide better management of public resources
enabling efficient monitoring of the results on the
implementation of public policies in the society.
Oliveira, 2009; Braga, 2010; Mourady and Elragal,
2011 demonstrate that platforms to support public
planning and decision support tools may contribute
to the tributary, fiscal and economic development
through the exploratory analysis of their data.
The data exploratory analysis may also be used
by the public sector to self-evaluate its performance
leading to better application of technological, human
and financial resources, optimizing processes and
speeding up the pace of administrative documents
and protocols, as seen in the work of Kum (2009).
The authors propose a system for knowledge
discovery on self-evaluation of results by the public
sector.
The studies previously mentioned use controlled
vocabulary, thesauri and have limited environments
able to operate only on a pre-established number of
variables. The handling of these tools raises the
operational cost, once it needs knowledge engineers
to create ontology and analyze patterns which will
be set up as prior conditions for exploratory analysis
tools.
This kind of solution hampers the achievement
of new knowledge from the operational data of
public institutions.
1.1 Objetives
This paper proposes a Generic Model for
Representation of Samples and Extraction of
461
Luiz de Almeida Gago Júnior E., Davis Breda G., Zanoni Marques E. and de Souza Mendes L..
SELF-ORGANIZING MAPS - An Approach Applied to the Electronic Government.
DOI: 10.5220/0003904104610470
In Proceedings of the 8th International Conference on Web Information Systems and Technologies (WEBIST-2012), pages 461-470
ISBN: 978-989-8565-08-2
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
Knowledge (GMRSEK) which enables the
identification of unknown patterns by mining the
operational data of the public institutions. To
identify unknown patterns in large volume of data,
we shall use a non-supervised classification
technique called self-organizing maps.
Self-organizing maps are neural networks of
competitive learning. On this kind of network the
processing units, called neurons, compete with one
another for the right of representing an input datum.
The neuron whose distance is shorter, regarding to
the input datum, wins the competitive process. The
winner neuron and its neighbors are adapted towards
the input datum; however, these contiguous neurons
are adapted with less intensity (Braga, 2010).
By using this data mining technique, it is
expected to get data gatherings which show similar
information between them, so that, it is possible to
find classes of logs and possible patterns existing in
the data. The patterns and gatherings found after
exploratory analysis of the data will be treated as
knowledge. The knowledge got through the
exploratory analysis must be stored in the GMRSEK
which provides an organized structure, enabling the
generation of reports and the use of this information
by electronic government systems.
2 ELECTRONIC GOVERNMENT
Many countries throughout the world stimulate
reforms in the public institutions due to growing
expectations of citizens, regarding to their
governors. The success of public management is
measured based on the benefits they assure to
society. Private organizations, communities and
citizens demand efficiency and accountability in
public resources management, as well as, ensure the
delivery of better services and results.
In this new scenario, the countries seek to
revitalize their public administrations by innovating
their structures and procedures, and qualifying their
human resources. In this context, the utilization of
Information and Communication Technologies
(ICT) has a fundamental role in managing and
creating an environment propitious to social and
economic growth, leading to the achievement of
these goals (Mourady and Elragal, 2011).
2.1 Structure of e-Gov
To establish and regulate the standards of integration
and exchange of services between government,
companies and citizens, it is important to define the
e-Gov structure. This structure makes easy to
understand the implementation process of the
electronic government and the implications of this
process (Ebrahim and Irani, 2005).
The generic structure of e-Gov proposed by
Ebrahim and Irani (2005) is divided into four layers,
as we can see in figure 1:
Figure 1: Structure of e-Gov (Ebrahim and Irani, 2005).
The access layer provides the means for
distribution of services, products and information
provided by e-Gov. These means consist of on-line
access channels, such as portals that can be accessed
via computer or mobile devices (Ebrahim and Irani
2005).
The e-Gov layer can be seen as a repository,
where all the services offered by the government are
allocated. The purpose of this layer is to establish a
single entry point for users, enabling the search and
utilization of services.
The e-Business layer is where the IT data an
services of different agencies and public
departments can be integrated. In this layer, the
common data and services between different
agencies and public institutions should be shared
through a distributed interface, allowing the various
public departments to access information from a
single point (Yan and Guo, 2010).
On the other hand, the infrastructure layer
concentrates the hardware solutions, which enable to
provide information and services via on-line access
channels.
We can list as elements of infrastructure layer the
application servers, routers and other pieces of
equipment which clears the way for distribution of
services via Internet, Intranets and Extranets
(Ebrahim and Irani, 2005).
2.2 Classification of e-Gov
e-Gov systems have broad applications and hold
users of distinctive needs and profiles. In order to
group the services offered to each class of users, it
comes the necessity of classifying the electronic
WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies
462
government systems. The e-Gov systems are
classified as follows: Citizen to e-Gov, Business to e-
Gov, Government to e-Gov, Internal Efficiency
Applications, and Effectiveness and Global
Infrastructure (Yan and Guo, 2010).
The Citizen to Government system class
concentrates the services offered to the citizens. In
general these services are communication channels
which permit the citizen to ask the public institutions
for the execution of a given task, for instance,
cleaning and mowing a park, or simply issuing a
form copy of a document, such as IPTU (tax for
urban territorial property), or a debt clearance
certificate (Yan and Guo, 2010).
The Business to Government system class holds
part of the services offered to the companies, such
as, printing forms, copying taxation documents,
thus, facilitating communication between
government and business. Among the services
offered to entrepreneurs, it is usual, in this class of
electronic government, the occurrence of electronic
bids, where competitive propositions are made for
getting the right of taking over a public enterprise or
a service bound to be outsourced (Marques, 2010).
The Government to Government class deals
with the services which must be shared between the
various agencies and departments of the government
itself. In general, governmental agencies and
departments do not adopt a single solution for
software and data storage; on the contrary, pieces of
information are kept in separate and distinctive
environments. In this scenario, the sharing of
information and services is a challenge for the public
institutions, which demand software and hardware
solutions that can lead to the solution of this problem
(Yan and Guo, 2010).
The class of systems called Internal Efficiency
and Effectiveness deals with applications that aim to
improve quality and efficiency of internal processes
in governmental agencies and departments. To
exemplify these applications, one can mention the
work of Kum (2009), which proposes a system of
knowledge discovery for self-evaluation on the
results achieved from public departments and
agencies, allowing to monitor the implementation
results of public policies in the society.
The class of global infrastructure comprises
matters concerned to interoperability of e-Gov
applications, providing quality and assurance of
services. The solutions employed in this class of
systems put together hardware and software
resources. As an example of global structure, one
can mention the work of Mendes (2009), which
establishes communication networks, enabling the
integrations of governmental agencies and
departments through the distribution of services via
on-line service channels (Mendes, 2009).
3 BUSINESS INTELLIGENCE
Business Intelligence (BI) comprises a set of
techniques which permit to identify behavior trends
from a frame of events. These trends can help the
process of decision making in business. With the
fast-evolving computing sector and the enhancement
of data storage mechanisms, organizations turned to
store all pieces of information coming from their
daily activities, such as sending protocols and
documents, recording activities performed by
clients, like ordering, purchasing, and so on
(Mourady and Elragal, 2011).
The organizations begin to see these data as
source of information that could guide their
evolution and development just by utilizing the
information concealed in the large volume of data
stored during long periods of gathering. The
growing competition between organizations and the
demand for better services by clients prompted the
development of more efficient techniques which
permit to analyze large volumes of data in an
intelligent way. The BI has emerged as a popular
expression to cover these needs and is classified as
systems for supporting the decision (Kum, 2009)
The large amount of data and the complexity of
its relations make difficult the understanding and
extraction of useful information for decision-
making. Thus, there is the need of storing these data
in simplified environments where the degree of
relationship among the data is lower, leading to
better performance in queries and cross-checking
information. To meet these requirements it comes
the Data Warehouse’s concept (DW), which are
multidimensional data bases with a lower level of
standardization compared to transactional databases.
In data warehouse, queries can be done more quickly
and the data do not suffer from having constant
modification (Kum, 2009).
Figure 2 displays the BI environment and
technologies involved in it:
SELF-ORGANIZINGMAPS-AnApproachAppliedtotheElectronicGovernment
463
Figure 2: BI environment.
As we can see in figure 2, the DW is fed by data
coming from transactional databases. The insertion
of data in the DW is done by tools called Extraction,
Transformation and Load – ETL. The data in the
DW are in a suitable format for exploration through
supporting tools for decision, without duplicities and
integrated as for terminologies and formats
(Mourady and Elragal, 2011).
3.1 Data Mining
Data mining is the analysis of large volumes of data
in order to recognize new patterns and trends
coming from information of an organization.
Generally, these data are cached in transactional
databases or in DW, and data mining uses
techniques of pattern recognition which searches for
existing similarities among the data under analysis.
These patterns are characterized based on recurrent
events, for instance, several people get the same
disease in a given time of the year. If this event
occurs again in the following years, it can be
considered a pattern. Data mining can identify this
kind of behavior by eliminating those less cyclical
facts (Kum, 2009).
Data mining enables knowledge discovery, i.e., it
gets unknown information among the data. When
there is no previous knowledge about the data to be
mined, it is used, in general, techniques of non-
supervised exploratory analysis. The self-organizing
maps are examples of these techniques, not requiring
any previous knowledge about the data, i.e., they
operate on large amount of non-classified data, of
unknown types, classes and groups (Kum, 2009).
The self-organizing maps are competitive neural
networks which are organized into two layers: the
input-layer and the output-layer. Each neuron of the
input-layer is connected to all the neurons of the
output-layer through the vectors of weights (Haykin,
1999). The completion of these neural networks
supposes the presence of a set of data, taken
randomly and in a repetitive way in which every
neuron has a weight vector associated with each
input of the total of inputs. There is competition
among all neurons to win the right of representing
the data displayed in the network. The neuron whose
vector is closer to the input datum wins the
competition and gets the name of Best Matching
Unit (BMU), (Haykin, 1999). The BMU neuron
alters its vector of weights in order to get even closer
to the displayed datum, increasing the likelihood of
winning again on the occasion of appearance of the
same datum. In order to identify groups, the
neighbor’s neurons of the winner neuron will also
have their weight driven to the same input, with less
intensity, though. (Haykin, 1999).
4 PROPOSED MODEL
This section presents the proposed model for
application of self-organizing maps in order to
identify patterns in databases of public institutions.
the Generic Model for Representation of Samples
and Extraction of Knowledge (GMRSEK) provides
mechanisms for storing data which enables
automated exploratory analysis of these data through
self-organizing maps. The need for a generic
environment to carry out data exploratory analysis is
due to the different software solutions adopted by
Brazilian municipalities. The GMRSEK is capable
of storing an undetermined number of samples made
up of dimensions and values that, after being
submitted to the self-organizing map, results in a set
of new pieces of information which, hereafter, we
call knowledge.
The expected results from data mining are
concentrations of data, whose meaning can be
represented by the GMRSEK through a hierarchical
structure. Figure 3 presents the Extraction Process of
Knowledge:
WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies
464
Figure 3: Extraction Process of knowledge.
As we can see in figure 3, data originated from
transactional databases go through a process of
transformation and are stored in a Multidimensional
Conceptual Model (MCM). In the MCM, the data
must be in an appropriate format for the application
of exploratory analysis, without duplicities and
integrated, as for the different terminologies existing
in the transactional databases. Although data stored
in the MCM are in a suitable format for exploratory
analysis, public organizations, in general, have
distinctive environments, turning the data mining
process difficult, due to the need of integration and
utilization of specific routines for data mining. In
this scenario, the GMRSEK is capable of storing an
undetermined number of samples coming from the
MCM, and, further, submit them to a self-organizing
map.
4.1 Multidimensional Model
The MCM to be used in the extraction process of
knowledge as showed in figure 3 is the model
proposed by Marques (2010). Marques proposes a
Multidimensional Conceptual Model (MCM) of
business intelligence application in electronic
government. In his work, he describes the
application of the MCM for analyzing operational
data in the Social Assistance area. Marques’ MCM
comprises the integration of different tools and open
sources technologies which regards from data
gathering and transformation to availability of tools
for end users to analyze and deal with pieces of
information stored in the MCM, according to a
model of intuitive use. Marques uses a structure
divided into three layers: ETL layer, Storage and
Availability of Data Visions, and End Users
Applications Layer.
The ETL layer is responsible for the process of
extraction, transformation and load of data in the
repositories of operational data to the database of
MCM. In this structure the ETL process is divided
into sub-layers: Motor ETL and Middleware.
To implement the ETL Engine sub-layer, it was
adopted a tool called Talend Open Studio, which is
specialized in integration and migration of data. To
choose this tool, Marques has taken into account the
available documentation and the facility of
providing exports routines in .jar extensions
(Marques, 2010). Diversely, in the Middleware sub-
layer it was adopted the JDBC driver. The changes
applied to the data include the removal of duplicated
records, integration of terminologies and values,
such as, monetary values, dates and profile data, like
gender, types of disabilities, race, color, and so on.
Besides the mentioned changes, data originated from
transactional databases undergo a structural
adequacy, accommodating the pieces of information
in an issue-oriented structure whose main focus is
the social care carried out to citizens.
The Storage and availability of data layer is
responsible for the controlling of stored data in the
MCM, resulting from the ETL process performed by
the previously described layer. This layer is divided
into the sub-layers Physical data in the BI databases,
whose function is to provide mechanisms for storing
data, and Logical Layer of BI data, accountable for
generating representations of data to upper layers.
For storing data in the sub-layer Physical data in BI
databases, Marques uses the Data Management
System MySQL, for its support to the various types
of indexes and its rapidness on data loading. As for
the sub-layer Logical Layer of BI data, it was
SELF-ORGANIZINGMAPS-AnApproachAppliedtotheElectronicGovernment
465
adopted the OLAP Mondrian server, which allows
the execution of multidimensional queries on a
relational database. Along with the server, the
Mondrian Schema Workbench tool is released to
help the multidimensional mapping of relational
data, facilitating the completion of the mapping files
in XML format (Marques, 2010).
The End Users Applications Layer has as its
objective to provide solutions that allow users to
intuitively analyze available data in BI environment
through pre-defined visions. The OpenI tool has
been sorted out for this purpose. The OpenI tool
allows users to check BI data through a WEB
application where the results are presented in form
of multidimensional tables and graphs (Marques,
2010).
4.2 Generic Model for Representation
of Sample and Extraction of
Knowledge
The Generic Model for Representation of Samples
and Extraction of Knowledge (GMRSEK) offers a
centralized environment capable of storing a large
volume of data, made up of an undetermined number
of dimensions and values. The GMRSEK consists of
a set of entities in charge of storing data which will
be utilized by data mining process, and summarizes
knowledge obtained through this process in a
hierarchical structure, thus, providing a single and
agile access for search. Figure 4 presents the
GMRSEK:
Figure 4: Generic Model for Representation of Samples
and Extraction of Knowledge.
In figure 4 we can see the existence of six
entities: Dimension, Record, Datum, Group, Tuple
and Knowledge. The entity Dimension stores
columns of a sample, identified in a single way,
while the entity Datum stores different values that
each dimension can take. It stores, along with the
descriptive value, a numeric constant which will be
used by the self-organizing map while running data
mining. This numeric constant will be utilized to
calculate the similarity between topological regions
of the self-organizing maps and the input data.
The entity Record relates to a value of the entity
Datum, where the dimension and the datum belong
to the same input. Each input has a record in the
entity Tuple, which relates to the entity Record, as
well. The entities Dimension, Datum, Tuple and
Record are Entities of Sample Representation (ESR),
in charge of storing all the data to be submitted to
the data mining process through the self-organizing
map.
The ESRs will be fulfilled with data stored in the
MCM proposed by Marques (2010). These data will
be extracted from the MCM through a specific
conversion routine and recorded in the GMRSEK.
The extraction, Transformation and Load process
(ETL) must be done through a specific routine
which reads the data of the multidimensional model,
changing these data to be stored in the entities of
sample representation of the GMRSEK.
The option for using the MCM proposed by
Marques (2010) took place due to being the data
converted into a suitable format for the mining
process, with a possible reduction of the number of
variables, which summarize and integrate the data to
be submitted to the extraction of knowledge. The
dimensions of MCM proposed by Marques (2010)
used by the routine of load are: gender, race, social
program, location, education and disability.
The mining of data will be accomplished through
a self-organizing map, due to its capability of non-
supervised classification and identification of groups
based on the similarities of data. The option for this
technique lies on the fact of not previously knowing
the data to be mined, so that it is not possible specify
a set of training which comprises all the possible
classes of objects existing in the data. The data
stored in the ESRs must be submitted to the self-
organizing map, triggering the non-supervised
exploratory analysis process.
The parameterization of the neural network and
choice of the self-organizing map topology comprise
parameters like: initial radius of the neighborhood
function, number of events for the learning process,
initial value for the adaptation pace (learning rate)
WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies
466
and the number of neurons existing in the self-
organizing maps. The choice of these parameters is
an empirical process whose goal is to get a point of
convergence with the least possible number of
neurons, thus, minimizing the processing time. The
convergence point is reached when the configuration
of the self-organizing map does not undergo
significant changes from an event to another. This
occurs because the vectors of synaptic weights
reached the minimum locations of the function to be
represented (Haykin, 1999).
The choice of the number of neurons is also an
empirical process, so that few neurons may not
represent all the groups existing in the data. On the
other hand, an excessive number of neurons can be
computationally costly. So, the appropriate number
of neurons is the one that represents all existing
groups in the data with the lowest number of units in
the self-organizing map. The interpretation of results
from the self-organizing map can be presented
through a graphical representation by the Unified
Distance Matrix (U-Matrix) and through analytical
representation by assessing the relation between the
records stored in the entity Knowledge, of the
GMRSEK.
It follows the routine for training routine self-
organizing map.
,
= weight.start();
r = network.getSize();
α = null;
δ = 0.9;
k = 0;
som (
{
,
…
}
,
{
,
…
}
,
{
,
…
}){
while((α=null||α<>0)&(k < 10000){
,
(
)
=
|
−
|;
α
(
)
=exp|
−
,
|
2.
()
;
w
(
+1
)
=
(
)
+
(
)
.
(
)
.
(
)
−
();
neighbors.reduce();
knowledge.reduce();
α = som(t) – som(t + 1);
k = k + 1;
}
}
As it can be seen in the routine previously
described, the synaptic weights
,
, between the
input layer and the network neurons are initialized at
random. The neighborhood radius of neurons,
represented by the variable r, is initially as large as
the network but it is reduced in all learning
iterations. The variable α stands for the difference of
the map in the time status t – 1 and when this
difference is equals zero we say there was data
convergence. The conditions for stopping neural
network come through either data convergence or
through a number k, which limits the iterations in
case of no convergence.
The
learning rate must be initiated by having a
fixed value; in the example, the learning rate δ
starts in 0.9, but must be gradually reduced as the
network learning goes on. The variable
,
()
stands for the winner neuron that is the closest one
to the input provided to the network. In the
sequence, the neighborhood function is calculated,
which affects the degree of adaptation of the neuron
and its neighbors. After concluding the data mining
by the self-organizing map and identification of the
groups by the entity Group, the hidden information
concerned to the data is already in the GMRSEK.
Although the knowledge is stored, reaching these
pieces of information may be a costly task under the
computational point of view, once the set of stored
data in the GMRSEK may be big. There is, then, the
need of a structure which leans the information,
making knowledge available in an agile and unique
access channel. This channel allows other
applications of electronic government which makes
use of knowledge achieved through the data mining
process for decision-making, therefore, enhancing
quality of reports and information provided to users.
As it can be seen in figure 4, the entity
Knowledge has self-relationship, featuring a
hierarchical structure, in such a way that enables
interdependent relationship among the entities Data,
Dimensions and Tuples. Hierarchical structures are
known by their representative capability and access
agility, however, the performance during the access
to these structures is closely related to data
balancing represented by them. The data must be
distributed, so that, the information tree does not
grow indiscriminately in just one side. If this occurs,
the access performance will be like a list and not like
a tree.
In the sequence, it is presented the Routine for
Balancing and Load which summarizes the
knowledge achieved by data mining in the
hierarchical structure comprised by the entity
Knowledge of the GMRSEK:
= groups.getAll();
dimensions.order();
for each of do {
=
.getTuples();
for each of do {
=
.getRecords();
integer j = 0;
for each of do {
if (<1) then {
knowledge.save();
}
else {
knowledge.save(,

);
}
}
}
}
SELF-ORGANIZINGMAPS-AnApproachAppliedtotheElectronicGovernment
467
As it can be seen in the above routine, to load all
the data in the entity Knowledge to the entities
Dimension, Record, Datum, Group and Tuple, they
have to be fulfilled in. The loading process of these
entities starts with getting all the groups found after
mining data, along with the organization of the set of
samples. The dimensions of the set of samples must
be organized in accordance with the number of
variations of their sides, in such a way that the
dimension with the lowest number of variations
must be presented first. We also notice that in the
first iteration of each record r, the entity Knowledge
refers to the record r. In the other iterations, the
entity Knowledge refers to the record r and to the
record r[ t - 1], where t – 1 stands for the previous
iteration record.
5 CASE STUDY
This sections presents the results achieved from a
real case study applied to data gathered from
services rendered to beneficaries of social programs
of Prefeitura Municipal de Campinas, SP, Brazil.
5.1 Operacional Data Source
The Brazilian Federal Government holds the control
on the service addressed to the beneficiaries of
social programs by using a data gathering tool in
order to characterize the status of families called
Family Development Index (FDI) (MDS, 2011).
Although this data gathering tool is established by
the Federal Government, public institutions look for
complementary solutions which can bring bigger
efficiency to the management of operational data,
thus, allowing visualization of managerial reports
and graphs regarding to social services. The SIGM is
a good example of these solutions. It is a software
focused on the need of municipal management,
providing mechanism for dealing with all services,
records of citizens, process management and other
relevant data for municipal administration. This
system is developed on the structure of multiple
layers, by using the EJB technology for distributing
the business objects, and managing relational
database system for data storage (Marques, (2010).
For managing operational data, Marques, (2010) has
adopted the SIGM module for Social Management
by loading in its MCM all bits of information from
the SIGM transactional database. Throughout the
ETL process, the data underwent a format change in
order to fit the MCM. This change leads to the
redistribution of information in the dimensions of
multidimensional model and the elimination of
duplicated records, with no integration of values, as
the data come from a unique source, that is, from the
SIGM transactional base.
In the MCM by Marques (2010) there are around
21,000 social care records, of which 1,621 were
loaded to the ESRs of GMRSEK. Only the most
consistent data records were sorted out, taking in
consideration the logs of the following pieces of
information: Gender, Race, Disability, Education,
Attends School, Type of social benefit and
metropolitan region. These dimensions were
selected because they can portrait individuals and
are capable of characterizing them without
interfering in their privacy.
5.2 Loading Data in the GMRSEK
While loading data from the MCM to the GMRSEK,
some dimensions had their values grouped into
broader classes aiming to provide closer data
similarity. For this reason, some different disabilities
were not considered, making this dimension to be a
Boolean one, that is, only saying whether the person
is disabled or not. Nevertheless, the item Education
had its various levels grouped into four categories:
None, Low, Fair, and High. The kind of social care,
likewise, had a reduction of variables, packing the
different benefits into five types. They are: Income
transference, Housing benefit, Social-educative
benefit, Child and Teen Care and Youth-addressed
Programs. The reduction for these variables was
necessary to bring bigger similarity in the data set,
once the similar social programs benefit citizens
with the same features.
5.3 Data Mining
During the exploratory analysis carried out through
the self-organizing map, it was possible to notice
that the free parameters of the neural network and
the choice of the number of neurons have directly
influenced the convergence of results, sometimes,
even the results themselves. Initially, the self-
organizing map had been defined with many
neurons, having 841 processing units, that is, a grid
of 29 x 29 neurons. This is a generous estimative,
taking in consideration the number of available data.
It took the exploratory analysis 15 hours to reach the
convergence point, and, in the end, it was possible to
identify the existence of five groups of data. By
decreasing the number of neurons to 64 units, the
convergence time has dropped to approximately 90
minutes; however, only four groups have come to
WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies
468
evidence, being the fifth one embodied to the others.
In the trial of establishing an intermediate value, a
grid of 100 processing units was then defined. In
this last configuration, it was possible to obtain the
same five groups resulting from the first execution
with fewer processing units and to shorten the
convergence time to around 12 hours.
For achieving convergence of results, the neural
network had to go through several learning events.
One could notice that when modifying the pace of
adaptation of the neural network, the number of
necessary events to convergence had been different.
Taking the 100-neuron grid, with the learning pace
starting at 0.9, it was necessary about 2300 events to
have the occurrence of convergence of results.
Nevertheless, by initiating at 0.3, it was necessary
around 1000 events for having convergence of
results. A third trial was carried out with the same
100-neuron grid, but with the adaptation pace at 0.6.
This way the convergence point was reached with
about 700 events. Figure 5 shows the U-Matrix
which illustrate the 100 neurons of the self-
organizing map and the groups found:
Figure 5: U-Matrix.
Figure 5 is a graphical representation of the
groups found by the data mining process through the
U-Matrix. In it, we can notice the division of the
input data into five different groups. Between one
group and another there is a delimitation done by
frontiers of bigger or smaller intensity. The darkest
frontiers F1 and F2 are those representing smaller
similarity among the groups. This way the clearest
frontier F3 represents bigger similarity among the
groups. It is possible to notice the existence of a
small group G1 made up of just one hexagon,
surrounded by clearer frontiers, at the bottom left of
the image. This group has little difference from G5
and G2 groups. It is important to highlight that there
is smaller similarity between G5 and G2 groups,
bordered by the darkest frontier F2 between them.
Other two groups G3 and G4 can be seen on the top
of figure 5. The frontier F1 between them shows that
there is little similarity between the two groups.
5.4 Balancing and Load
After the end of the automatic exploratory analysis
carried out by the self-organized map, the five
groups found had already been identified in the
entity Group of the GMRSEK. Although the groups
were associated with the tuples, that is, to the inputs
which generated them, the big volume of data made
difficult the understanding of results in an analytical
way. The results, in their analytical form, were better
understood after having summarized the knowledge,
representing the results through a hierarchical
structure provided by the entity Knowledge of the
GMRSEK. This was possible thanks to Balancing
and Load Routine. The data summarized by the
balancing and load routine show that people who
claim for programs addressed to the young public
are generally of female gender.
The young female, deficiency holders, have a
high level of education and are no longer attending
school and live in the east zone of town.
Nevertheless, the young female, non-deficiency
holders, have medium education level and are still
attending school, usually downtown, as shown in
figure 6:
Figure 6: Descriptive representation of groups.
SELF-ORGANIZINGMAPS-AnApproachAppliedtotheElectronicGovernment
469
This piece of information may be useful for the
municipal institutions, as it allows the actions to be
taken in order to improve services to these
beneficaries. The case study has shown that, in the
east zone of town, social attendance points must
provide accessibility to disabled people beyond
forwarding the most qualified beneficiaries to the
job market, assuring that other people can be helped
in this region.
6 CONCLUSIONS
The U-Matrix was efficient to graphically represent
the groups found, bearing a big reduction of
dimensionality without having information loss.
Although data may be represented by the U-Matrix,
this representation only shows the topological
display of groups, not allowing the end user to
analytically visualize the results achieved.
The Generic Model for Representation of
Samples and Extraction of Knowledge (GMRSEK)
has shown itself as flexible and capable of operating
on a variable number of analysis dimensions with
multiple values associated with them. Through the
entity Knowledge, the result achieved with the data
mining process can be represented under a
hierarchical form, allowing the decision-maker to
have a good perception of existing patterns about the
data.
The representation of groups by a hierarchical
structure permits users to visualize the data groups
in an analytical way, enabling the understanding of
the patterns. The utilization of a hierarchical
structure for the representation of knowledge has
ended up having a good performance during the
queries carried out; however, the balancing and load
process of data in this structure is costly under the
computational point of view. The data mining, as
well as, the balancing and load routine for the
representation of the knowledge are periodically
performed, being more often the utilization of this
model for asking queries and issuing reports.
The MCM proposed by Marques (2010) has
brought more quality to the knowledge discovery
process, once the data had already gone through
treatments, such as duplicity removal, uniformity
and grouping of values, beyond bearing totals in
their table of facts, thus, enhancing the integrity of
samples in the GMRSEK.
Future work may use the knowledge stored in
GMRSEK to classify new records, without a new
mining of data types. A supervised learning
mechanism can be training with the knowledge
stored in Knowledge organization, and from then
classify new data registered by ICT technologies.
REFERENCES
Braga, C. V., 2010. Rede Neural e Regressão Linear:
Comparativo entre Técnicas Aplicadas a um Caso
Prático na Receita Federal. Dissertação de Mestrado.
Faculdade de Economia e Finanças IBMEC.
Ebrahim, Z., Irani, Z., 2005. E-government adoption:
structure and barriers, Business Process Management
Journal, v. 11 n. 5.
Haykin, S. A Comprehensive Fundation, 1999. McMaster
University Hamilton, Ontario, Canada, Pearson
Education, 1999.
Klischewski, R., 2003. Semantic Web for e-Government,
Springer Berlin.
Kum, H., Duncan, D. F., 2009. Stewart, C. J.: Supporting
self-evaluation in local government via Knowledge
Discovery and Data Mining. Government Information
Quarterly.
Marques, E. Z., Miani, R. S., Gago Junior, E. L. A.,
Mendes, L. S. Development of a Business Intelligence
Environment for e-Gov Using Open Source
Technologies. In: Data Warehousing and Knowledge
Discovery, 2010, Bilbao. Lecture Notes in Computer
Science. Berlim: Springer, 2010. v. 6263. p. 203-214.
Mendes, L. S., Bottoli, M. L., Breda, G. D., 2009. Digital
cities and open MANs: A new communications
paradigm, LATINCOM’09. IEEE Latin-American
Conference on Communications.
Ministério do Desenvolvimento Social
http://www.mds.gov.br/programabolsafamilia/noticias/
aplicativo-do-indice-de-desenvolvimento-da-familia-
ja-esta-disponivel/, 2011.
Mourady, A., Elragal, A., 2011. Business Intelligence in
Support of eGov Healthcare Decisions. European,
Mediterraneam & Middle Eastern Conference on
Information System, Athens, Greece.
Oliveira, T. P. S., 2009. Sistemas Baseados em
Conhecimento e Ferramentas Colaborativas para
Gestão Pública: Uma proposta ao Planejamento
Público Local.
Yan, P., Guo, J., 2010. Researching and Designing the
Structure of E-government Based on SOA,
Proceedings of the 2010 International Conference on
E-Business and E-Government.
WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies
470