process assistance based on ModelResult class and,
- KDD phase assistance. Since our ontology has
a formal structure related to KDD process, is able to
infer some result at each phase.
To this end, user need to invoke the system rule
engine (reasoner) indicating some relevant
information, e.g., at data preprocessing task:
swrl:query hasDataPreprocessingTask(?dpp,”ds”),
where hasDataPreProcessingTask is an OWL
property which infers from ontology all assigned
data type preprocessing tasks (dpp) related to each
attribute type within the data set “ds”. Moreover,
user is also assisted in terms of ontology capability
index, through the ontology index - precision, recall
and PRI metrics.
Once we have a set of running KDD process
registered at the knowledge base, whenever a new
KDD process starts one the ontology may support
the user at different KDD phases. As example to a
new classification process execution the user
interaction with ontology will follow the framework
as described in next section. The ontology will lead
user efforts towards the knowledge extraction
suggesting by context. That is the ontology will act
accordingly to user question, e.g., at domain
objective definition (presented by user) the ontology
will infer which is type of objectives does the
ontology has. All inference work is dependent of
previous loaded knowledge. Hence, there is an
ontology limitation – only may assist in KDD
process which has some similar characteristics to
others already registered.
5 EXPERIMENT
To build up mining experiments we have used Weka
Toolkit (Witten and Frank, 2000) which allowed not
only the actual mining but also featured analysis and
algorithm evaluation. These experiments did not aim
to the full construction of a classification model but
instead to test and analyze different approaches and
further ranking.
Our system prototype operation follows general
KDD framework (Figure 1) and uses the ontology to
assist at each user interaction. Our experimentation
was developed over a real oil company fidelity card
marketing database. This database has three main
tables: card owner; card transactions and fuel
station.
To carry out this we have developed an initial set
of SWRL rules. Since KDD is an interactive
process, these rules deal at both levels: user and
ontological levels. The logic captured by these rules
is this section using an abstract SWRL
representation, in which variables are prefaced with
question marks.
Domain objective: customer profile
Modeling objective: description
Initial database: fuel fidelity card;
Database structure: 4 tables;
The most relevant rule extracted from above data
algorithms use was:
if (age<27 and vehicleType=”Lig” and
sex=”Female”) then 1stUsed=”p”
In this model we may say that, female card owners
under 27 years of age have a “lig” (ligeiro) category
car and use a fuel station located in range of 10
kilometers from their address.
Also, practical KDD process tasks have been
done supported by SWRL ontology queries. This
query tasks was manually performed by the user.
Therefore, the guidance was accomplished and
achieved throughout knowledge base updating with
the general model:
INSERT record KNOWLEDGE BASE
hasAlgorithm(J48) AND
hasModelingObjectiveType(classif
ication) AND
hasAlgorithmWorkingData
({idCard; age; carClientGap;
civilStatus; sex; vehicleType;
vehicleAge;nTransactions;
tLiters; tAmountFuel;tQtdShop;
1stUsed; 2stUsed; 3stUsed }) AND
Evaluation(67,41%; 95,5%) AND
hasResultMoldel (J48;
classification;“wds”,PCC;0,674;0
955)
The evaluation, once performed, the system
automatically updates the knowledge base with a
new record. The registered information will serve
for future use – knowledge sharing and reuse.
From the aforementioned previous research work
(Pinto and Santos, 2009), we also have used the
output models and integrated them into the
knowledge base.
6 CONCLUSIONS AND
FURTHER RESEARCH
The KDD success still is very much user dependent.
Though our system may suggest a valid set of tasks
which better fits in KDD process design, it still miss
the capability of automatically runs the data, develop
modeling approaches and apply algorithms.
This work strived to improve KDD process
supported by ontologies. To this end, we have used
ICEIS 2011 - 13th International Conference on Enterprise Information Systems
324