of applications using XML for syntax and URIs
(Uniform Resource Identifier) for naming. At the
heart of all semantic web applications is the use of
ontologies, which describe entities and relationships
among entities. The concept of metadata has evolved
over the years starting from data dictionaries to
database schemas and now to ontologies.
Data mining aims at finding patterns and subtle
relationships in data and discovering rules that allow
the prediction of future results by the use of
automatic or semi-automatic processes. It is an
information extraction activity, whose goal is to
discover hidden facts contained in databases, using a
combination of machine learning, statistical analysis,
modelling techniques and database technology.
Mining the data on the web, however, is one of the
major challenges faced by the data management and
mining community, as well as those working on web
information management and machine learning. The
characteristic feature of Web Mining is the use of
Data Mining techniques to elaborate on content,
structure, and usage of Web resources.
In the Semantic Web, content and structure are
strongly inter-wined. Therefore the distinction
between structure and content mining vanishes. The
mining algorithms can be transformed in order to
deal with RDF or ontology-based data. Mining the
usage can be enhanced further, if the semantics are
contained explicitly in the pages by referring to
concepts of ontologies.
3 RELATED WORK
We will firstly review the formal model of
association rule as was introduced by Agrawal
(Agrawal et Al., 93). Formally association rules
mining can be stated as follows:
• Let I= {i
1
, i
2
, … i
n
} be a set of items
• Let D, be a set of transactions, where each
transaction T is a set of items satisfying T
⊆ I
• Each transaction is assigned an identifier, called
TID
• Let X be a set of items, a transaction T is said to
contain X if and only if X
⊆ T.
• An association rule is an implication of the form
X
⇒ Y.
While association rules provide means to
discover many interesting associations, they fail to
discover others, no less interesting associations that
are also hidden in the data. While this may not be
very dangerous in classical mining procedure this
seems to be a serious problem in semantic web
mining since this will form the basis of the ontology
that will form the semantic web. However maximal
association rules are not designed to replace regular
association rules, but rather to allow the discovery of
the concepts, which will be included in the ontology
and the relations that bind them together. For this
reason, we propose in this paper enhancements to
the algorithm proposed above. These enhancements
will allow the discovery of new association rules to
complement them (Amihood et Al., 05). Maximal
associations was proposed to allow the discovery of
associations pertaining to items that most often do
not appear alone, but rather together with closely
related items, and hence associations relevant only
to these items tend to obtain low confidence in the
classical algorithms, for example Apriori. In a
maximal association rule we are interested in
capturing the notion that whenever X appears alone
then Y also appears, with some confidence
(Amihood et Al., 05) and this is why it is crucial for
text/web mining for learning the ontology.
In addition, some redundant, unwanted or even
false strong association rules are likely to be
generated because the correlation of attributes is
ignored (Yong Xu et Al. 05). So the Chi-Squared
test should be introduced to association rules mining
since it could remove irrelevant itemsets and rules
that have high support but no dependency.
4 ENHANCED LEARNING
ALGORITHM: EN-APRIORI
The main learning algorithm that has been adopted
in our paper is the one proposed in (Maedche et Al.,
99)(Berendett et Al, 06). This learning algorithm is
effective up to a certain level, and since it is a
text/web mining approach then the techniques from
text mining and web mining have been combined to
achieve the learning of the ontology.
We will stress again that the learning of the
ontology step is basically the most important step,
since its results will allow the discovery of the
concepts, which will be included in the ontology and
the relations that bind them together. For this reason,
we propose in this paper enhancements to the
algorithm proposed above. These enhancements will
allow the discovery of new association rules that
have been missed by the original learning algorithm
and in addition it allows the pruning of some faulty
rules that appeared to be valid strong association
rules. Our approach is based on the idea of
introducing two new algorithms and integrating
ICSOFT 2007 - International Conference on Software and Data Technologies
190