4 KEOPS METHODOLOGY
KEOPS is a methodology which drives data mining
processes by integrating expert knowledge. These are
the goals addressed:
• To manage interactions between knowledge and
data all along the data mining process: data prepa-
ration, datasets generation, modeling, evaluation
and results visualization.
• To evaluate extracted models according to domain
expert knowledge.
• To provide easy navigation throughout the space
of results.
KEOPS (cf. fig. 1) is based upon an ontology driven
information system (ODIS) set up with four compo-
nents:
• An application ontology whose concepts and re-
lationships between them are dedicated to domain
and data mining task.
• A Mining Oriented DataBase (MODB): a rela-
tional database whose attributes and values are
chosen among ontology concepts.
• A knowledge base to express consensual knowl-
edge, obvious knowledge and user assumptions.
• A set of information system components - user in-
terfaces, extraction algorithms, evaluation meth-
ods - in order to select the most relevant extracted
models according to expert knowledge.
KEOPS methodology extends the CRISP-DM
process model by integrating knowledge in most steps
of the mining process. The initial step focuses on
business understanding. The second step focuses on
data understanding and activities in order to check
data reliability. Data reliability problems are solved
during the third step of data preparation. The fourth
step is the evaluation of extracted models. In this
paper we don’t focus on modeling step of CRISP-
DM model since we ran CLOSE algorithm (Pasquier
et al., 1999) which extracts association rules without
domain knowledge.
4.1 Business Understanding
During business understanding step, documents, data,
domain knowledge and discussion between experts
lead to assess situation, to determine business objec-
tives and success criteria, and to evaluate risks and
contingencies. However this step is often rather infor-
mal.
KEOPS methodology requires to build an ontol-
ogy driven information system during the next step,
data understanding. Consequently an informal speci-
fication of business objectives and expert knowledge
is henceforth insufficient. Thus, it is necessary to for-
malize expert knowledge during business understand-
ing. We chose to state knowledge with production
rules, also called “if ... then ...” rules. These rules are
modular, each defining a small and independent piece
of knowledge. Furthermore, they can be easily com-
pared to extracted association rules. Each knowledge
rule has some essential properties to select the most
interesting association rules:
• Knowledge confidence level: five different values
are available to describe knowledge confidence
according to a domain expert. These values are
ranges of confidence values: 0-20%, 20-40%, 40-
60%, 60-80% and 80-100%. We call confidence
the probability for the rule consequence to occur
when the rule condition holds.
• Knowledge certainty:
– Obvious: knowledge cannot be contradicted.
– Consensual: domain knowledge shared among
experts.
– Assumption: knowledge the user wants to
check.
Since the description of expert interview methodol-
ogy in order to capture knowledge is beyond the
scope of this paper, the reader should refer to (Becker,
1976).
4.2 Data Understanding
Data understanding means selection and description
of source data in order to capture their semantic and
reliability. During this step, the ontology is built in
order to identify domain concepts and relationships
between them (the objective is to select among data
the most interesting attributes according to the busi-
ness objectives), to solve ambiguities within data and
to choose data discretization levels.
Consequently, the ontology formalizes domain
concepts and information about data. This ontology
is an application ontology; it contains the essential
knowledge in order to drive data mining tasks. On-
tology concepts are related to domain concepts, how-
ever relationships between them model database rela-
tionships. During next step, data preparation (cf. sec-
tion 4.3), a relational database called Mining Oriented
DataBase (MODB) will be built.
In order to understand links between the MODB
and the ontology it is necessary to define notions of
domain, concept and relationships:
• Domain: This notion in KEOPS methodology,
refers to the notion of domain in relational theory.
ICEIS 2008 - International Conference on Enterprise Information Systems
56