validation agent is to remove any dead links i.e.,
JANs that don’t exist anymore and ensure that the
database is in a consistent and complete state.
This paper is organized into six parts. A
background introduction is given in Section 2 to
illustrate our motivation for developing the Vijjana
system. Section 3 briefly introduces the framework
of our system. From it we can tell the essence of
Markup and Validation agent which crosses two
agents and play an important role in maintaining
information consistence. Detailed agent work
process and principle can be found in Section 4. The
first part in this section explains a popular key
phrases algorithm and its application in Vijjana. And
followed is the whole work process. Section 5 gives
out A concise conclusion and some future works.
The main difficulty of using the Web as a
knowledge source lies in the fact that the Web is
nothing more than a list of hyper-linked pages where
the links have no associated semantics. Research on
semantic webs is aimed at mitigating this difficulty.
Tiwana et al. (2001) and Knoblock et al. (1997)
discuss a uniform way to represent web resources
and suggest models for automatic integration. Work
at IBM on the SHER project (Dolby et al., 2007;
Fokoue et al., 2007) focuses on simplifying
ontologies and scalable semantic retrieval through
summarization and refinement. There has been also
considerable research in the Artificial Intelligence
community on formalizing knowledge
representation (Sowa, 2000; Minsky, 1968; Sowa
and Majumdar, 2003) which is being adopted by the
researchers in the semantic web community
( All of these efforts
rely in one form or other on the ability to discover
semantic links automatically by analyzing the
contents of web pages, which poses considerable
difficulties due to the ad hoc nature of web pages.
While automatically converting the current web into
a fully linked semantic web may be a solution, such
an outcome is unlikely in the near future.
Meanwhile a number of organizations such as (,
(, (
are busy creating what are called social networking
sites where a person searching the web may come
across an interesting link that is then “marked” with
a set of tags (keywords) which are stored in the site
owner’s server. A recent start-up company –
RadialNetworks has developed a system called
Twain (, which claims to create a
semantic network automatically. While this may be
an advantage for casual social networking it will be
unsuitable for enterprise-wide knowledge networks
as there are well-established relationships between
document types specific to that organization or
domain which cannot be derived automatically.
The information created via these sites may be
kept private, or it may be combined with similar lists
created by other people – thus the name social
network. In due course these lists may grow
enormously needing the employment of a search
engine bringing us back to the original problem -
how to cope with a large number of links that cannot
be visualized in their semantic context. Current
social bookmarking sites do not have any semantic
linking of web pages. For a knowledge network to
be useful for a large community of users working in
a well-defined domain (e.g. Computer Science
Teaching), the semantic web should be buildable co-
operatively using a predefined taxonomy and link
With this motivation, we propose a model we
call Vijjana (a Sanskrit word that represents
collective knowledge created through classification
and analysis) which can help in organizing
individually discovered web pages drawn from a
narrowly bounded domain into a knowledge
network. This can be visualized as a hyper tree
(, or a radial graph
(, thus making the
semantic relations visible. The visibility of semantic
relationships is the key to comprehending what is
actually inside the knowledge network. It can be
perused and also searched by anybody who wants to
“discover” knowledge in that domain. Let us
consider a simple example where two professors
Smith and Bradley among others can contribute
useful links to web pages (we call them Jans which
has roughly the same meaning as the word knol
popularized by Google to represent units of
knowledge) such as syllabi, homework problems,
etc. to an evolving Computer Science specific
knowledge network, say Vijjana-CS. These Jans are
then classified and interlinked using a pre-defined
taxonomy and relational semantics. This Vijjana-CS
will grow organically as contributions continue. We
can also define a number of agents associated with
this model, which can keep the knowledge network
complete and consistent by removing missing Jans
and associated links. In addition, we can create a
WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies