As it can be seen in the above routine, to load all
the data in the entity Knowledge to the entities
Dimension, Record, Datum, Group and Tuple, they
have to be fulfilled in. The loading process of these
entities starts with getting all the groups found after
mining data, along with the organization of the set of
samples. The dimensions of the set of samples must
be organized in accordance with the number of
variations of their sides, in such a way that the
dimension with the lowest number of variations
must be presented first. We also notice that in the
first iteration of each record r, the entity Knowledge
refers to the record r. In the other iterations, the
entity Knowledge refers to the record r and to the
record r[ t - 1], where t – 1 stands for the previous
iteration record.
5 CASE STUDY
This sections presents the results achieved from a
real case study applied to data gathered from
services rendered to beneficaries of social programs
of Prefeitura Municipal de Campinas, SP, Brazil.
5.1 Operacional Data Source
The Brazilian Federal Government holds the control
on the service addressed to the beneficiaries of
social programs by using a data gathering tool in
order to characterize the status of families called
Family Development Index (FDI) (MDS, 2011).
Although this data gathering tool is established by
the Federal Government, public institutions look for
complementary solutions which can bring bigger
efficiency to the management of operational data,
thus, allowing visualization of managerial reports
and graphs regarding to social services. The SIGM is
a good example of these solutions. It is a software
focused on the need of municipal management,
providing mechanism for dealing with all services,
records of citizens, process management and other
relevant data for municipal administration. This
system is developed on the structure of multiple
layers, by using the EJB technology for distributing
the business objects, and managing relational
database system for data storage (Marques, (2010).
For managing operational data, Marques, (2010) has
adopted the SIGM module for Social Management
by loading in its MCM all bits of information from
the SIGM transactional database. Throughout the
ETL process, the data underwent a format change in
order to fit the MCM. This change leads to the
redistribution of information in the dimensions of
multidimensional model and the elimination of
duplicated records, with no integration of values, as
the data come from a unique source, that is, from the
SIGM transactional base.
In the MCM by Marques (2010) there are around
21,000 social care records, of which 1,621 were
loaded to the ESRs of GMRSEK. Only the most
consistent data records were sorted out, taking in
consideration the logs of the following pieces of
information: Gender, Race, Disability, Education,
Attends School, Type of social benefit and
metropolitan region. These dimensions were
selected because they can portrait individuals and
are capable of characterizing them without
interfering in their privacy.
5.2 Loading Data in the GMRSEK
While loading data from the MCM to the GMRSEK,
some dimensions had their values grouped into
broader classes aiming to provide closer data
similarity. For this reason, some different disabilities
were not considered, making this dimension to be a
Boolean one, that is, only saying whether the person
is disabled or not. Nevertheless, the item Education
had its various levels grouped into four categories:
None, Low, Fair, and High. The kind of social care,
likewise, had a reduction of variables, packing the
different benefits into five types. They are: Income
transference, Housing benefit, Social-educative
benefit, Child and Teen Care and Youth-addressed
Programs. The reduction for these variables was
necessary to bring bigger similarity in the data set,
once the similar social programs benefit citizens
with the same features.
5.3 Data Mining
During the exploratory analysis carried out through
the self-organizing map, it was possible to notice
that the free parameters of the neural network and
the choice of the number of neurons have directly
influenced the convergence of results, sometimes,
even the results themselves. Initially, the self-
organizing map had been defined with many
neurons, having 841 processing units, that is, a grid
of 29 x 29 neurons. This is a generous estimative,
taking in consideration the number of available data.
It took the exploratory analysis 15 hours to reach the
convergence point, and, in the end, it was possible to
identify the existence of five groups of data. By
decreasing the number of neurons to 64 units, the
convergence time has dropped to approximately 90
minutes; however, only four groups have come to
WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies
468