categories unknown in document language:
adjectives and verbs, for which three possibilities are
recognized: Entailment, that is, a verb has this
relationship with another if its existence depends on
the other verb; Troponym or verb forms; Causal
Verb.
Research which recognizes having used
WordNet demonstrates its effectiveness in
multilingual settings, in the extraction of textual and
iconic document information, as well as in the
identification of concepts in natural language
through its use for clarification, for semantic
distance and the expansion of the query. Even more
significant for us, was its capacity in the extraction
and categorization of documents through the
extraction of semantics features by way of
grammatical categorization of names, verbs and
adjectives in WordNet and the prediction of the
user’s interest based on a hybrid model which
considered the key words and the conceptual
knowledge representation of WordNet
2
. Relying on
this, S. M. Harabagiu (1998, 265-269) has presented
a computational model to recognize the cohesive and
coherent structures of texts, with the contribution of
the lexical-semantic information of WordNet, whose
objective is to construct association designs between
phrases and coherence relations, as well as to
discover lexical characteristics within coherence
categories. WordNet, then, appeared as an auxiliary
instrument for the design of semantic ontologies,
immediately oriented for quality informational
extraction on the web, which has led Keng Woei
Tan, Hyoil-Han and R. Elmasri to present the
prototype WebOntEx (Web Ontology Extraction)
aimed at creating ontologies to describe semantically
data from the web
3
.
Judith Klavans & Min-Yen Kan (1998) have
investigated the automatic determination of the
genre of a document depending on the category of
verb in WordNet used in the same.
2.1.2 Computerized linguistic resources
The analyses and tools used by researchers in
Computational Linguistics have been particularly
useful, especially since our objective was becoming
the realization of a semantic of the relationships, an
2
G. Scheler, 1996, p. 499, suggested the grammatical
categorization. INFOS was analyzed by K. J. Mock & V.
R. Vemuri, 1997, pp. 633-644.
3
Meersman, R. A., 1999, pp. 30-45, identified the
ontological effectiveness of WordNet, demonstrated in
WebOntEx, which
Keng Woei Tan, Hyoil-Han and R.
Elmasri
present in 2000, pp. 11-18.
aspect in which it benefited from the linguistic
applications for the administration of information
contents. The computer applications of
Computational Linguistics permit one to lemmatize,
a voice by identifying its canonical form, its
grammatical category and its inflexion, as well as
obtain diverse forms from a single canonical form or
inflexion. This capability allows one to recognize,
generate and manipulate the morpholexical relations
in a voice. Both products of two computational
linguistics research groups have been useful to us:
• CLIC, a research group lead by Prof. María
Antonia Martí
4
. Among the possibilities
offered, in our effort to elaborate a system
which generates thesauri automatically, a
large role was played by: the parser,
generator and morphological clarifier to
identify morphological interpretations of a
voice through the inflexion generator
(which canonizes a voice as a lema (the
canonical form) and refers to it all its
associated forms), the lemmatizer (which
gives morphological information to the
lema) and the tagger (which labels the
components of an oration); the parser to
identify syntagmas in a sentence; and
EuroWordNet, particularly for its capacity
to define meanings according to the
synsets, the synonyms and the relations
between different meanings of words.
• GEDLC, of the Department of Computer
Science and Systems in the University of
Las Palmas in Gran Canaria
5
, joins the
functionality of the lemmatizer, inflexion
generator, clarifier, morphological
generator and morpholexical relations with
a system for Text Analysis, a
Computational System of Morphological
Administration of Spanish, but specifically
a Conjugator and Verbal Lemmatizer.
An adequate application of the principles of both
instruments has allowed us to begin resolving the
problems of the treatment of a base vocabulary of
the corpus, the terminological normalization and
clarification and a segmentation of units of
information. We realized that it was both possible
and necessary to widen the verbal morphology to
periphrasis and verbal locutions, much richer in the
4
Available from: <http://clic.fil.ub.es> [Cited
10/28/03].
5
Available from: <http://gedlc.ulpgc.es> [Cited
10/28/03].
VERBS & TOPIC MAPS: A PROPOSAL FOR LEGAL DOCUMENTATION FROM THE DOCUMENT CONTENT
ANALYSIS PERSPECTIVE
65