2 BACKGROUND
In general, knowledge can be experience, concepts,
values, or beliefs that increase an individual’s ability
to take effective action (F. Zheng, 2008).
Knowledge can be either implicit or explicit. The
former is represented by tacit experience which can
come through individual ideas, intuition, experience,
values and judgements. This type of knowledge is
dynamic in nature. It can be accessed only through
direct participation and communication with field
experts that possess this knowledge. The know-how
of each knowledge worker is accordingly based on
this tacit (or implicit) knowledge. Instead, explicit
knowledge usually includes anything that is saveable
in an electronic format or in other words what we
are able to transcribe and to share.
Knowledge discovery can be defined as “the non-
trivial extraction of implicit, unknown, and
potentially useful information from data. When
working with texts, knowledge discovery refers
generally to the process of extracting interesting
information from a large amount of unstructured
textual documents. The goal of this process is to find
and extract useful patterns. To do this, specific
methods and algorithms from the fields of machine
learning and statistics are applied. Text mining is,
thus, the application of these algorithms and
methods to texts ”(U.M. Fayyad, 1996).
Grid is an infrastructure which allows shared
resources to be coordinated inside dynamic
organisations, be they individuals, institutions or
resources. It offers a flexible environment where
resources can be dynamically reorganised without
altering any active processing on the GRID and
provide connectivity for data distributed in different
locations. This can resolve transparency problems
related to location while providing a mechanism
which allows easier access to and management of
distributed data as well as the virtualisation and
sharing of GRID connected resources. To
manipulate intensive computation procedures, the
platform can provide automatic allocation of
resources, scheduling and algorithm implementation
in relation to the availability, capacity and position
of these resources. A GRID can increase efficiency
while reducing the cost of computational networks
by decreasing data processing times, optimizing
resources, and distributing the workload. Thus, users
are provided the results of large operations with
greater speed and lower costs (I. Foster, 2001).
Attempts to automate knowledge processes date
to the early 1980s. Several processes have been
employed on parallel computing platforms to
achieve high performance on the analysis of large
data sets stored on a single site. Recently, the
demand for knowledge processes has expanded to
include the management and analyses of multi-site
and multi-owner data repositories. This task involves
large data-sets, the geographic distribution of data,
users and resources, and computational intensive
analysis demands for new parallel and distributed
platforms for knowledge processes as computational
grid technology. The resulting application of grid
technology to the knowledge field has been termed
Knowledge-Grid (M.Cannataro, 2001).
Workflow automation technology has been
developed to facilitate organizational coordination
and collaboration by automating entire work
processes and controlling the flow of information
among participants. A workflow can be used to
define the work process, control activity requests,
route relevant documents to the appropriate agents,
enforce deadlines, and monitor the progress of work
(S. X. Sun, 2008). The Workflow Management
Coalition (WFMC) defines a workflow as “… the
total or partial automation of business procedures
where documents, information or tasks are passed
between participants according to a defined set of
rules …” (www.wfms.com). A business process is a
group of necessary tasks and a set of conditions
which determines the order of their completion. A
task is a logical unit of work that must be performed
by a resource in its entirety. A resource can be a
person or machine or it can be a group of persons or
machines that perform specific tasks. The
performance of a task by a resource is called an
activity (Wil van der Aalst, 2002).
Hence, a workflow can be seen as a structure which
not only contains tasks/activities but also
coordinates and supervises their execution.
Different types of workflow have been identified:
Collaborative workflows manage less rigid
processes and allow connections among those users
closest to the collaboration as well as work groups;
Structured workflows manage structurally well-
defined and repeatable activities which can be
specified through a series of rules. Examples of
structured workflows are: (a) administrative
workflows which manage the flow of electronic
forms, integrating them with message systems or
email; and (b) production workflows which manage
the flow of well-structured work, defined by well-
formalised rules and dependencies;
Ad Hoc workflows are created by using lighter
systems which give the user the task of identifying
the correct procedural steps to take each time a
A WORKFLOW BASED APPROACH FOR KNOWLEDGE GRID APPLICATION
231