numbers of researchers to describe natural languages
in the same way as formal languages. Maurice Gross
(Gross, 1997) undertook with his team of the LADL
(French Laboratory for Linguistics and Information
Retrieval) the exhaustive examination of simple sen-
tences of French, in order to have reliable and quan-
tified data on which it would be possible to make rig-
orous scientific experiments. To exploit the linguis-
tic knowledge an application Unitex was created at
LADL (Paumier, 2003). Unitex is an environment of
enhancement used to build formalized descriptions to
broad coverage of natural languages and apply them
as texts of important size in real time. Unitex treat
in real time the texts of several mega-bytes for the in-
dexing of morpho-syntactic reasons, the search for set
phrases or semi-fixed phrases, and the production of
agreements and the statistical study of the results.
Another way to automatically express an opinion
from the text is a use of classifier. The statistics meth-
ods suppose that descriptions of the objects of the
same class are divided by respecting a specific struc-
ture of the class. Learning methods based on an exam-
ple are often used in information’s research on a large
group of text. Problems consist in constituting a rep-
resentative corpus of the field which we operate, and
to find the rules or to constitute an operational model
of this corpus. This model makes the system able to
predict the behaviour to adopt when a new candidate
arrives to classification. There was a lot of research
in classification of reviews to positive and negative
like the works of Turney, Littman, Dave, Lawrance,
Pang, Lee. Classifiers identify the well-known classes
to which belong the objects. The classifiers’ perfor-
mance depends of the model for each class of a base
learning (Turney and Littman, 2003), (Wiebe et al.,
2004).
2 LINGUISTIC RESOURCES
The linguistic resource to achieve the information re-
trieval and extraction are as follows: dictionaries, net-
works of the recursive transitions (local grammar, ta-
bles of lexicon-grammar.
The digital dictionaries employed by Unitex use
formalism of DELA. Numeric dictionaries describe
both the simple words and the complex words of
a language. Dictionaries associate the word with a
lemma and a series of grammatical, semantical and
inflexional codes.
Grammar is a representation of linguistic phenom-
ena by recursive transitions (RTN), formalism close
to that of the finite state automaton. Many studies
have highlighted the adequacy of automats on linguis-
tic problems. A transducer with a finite number of
states is a graph which represents a whole of entry
sequences, and associates sequences produced as an
output. Generally a grammar represents sequences of
words and produces linguistic information like the in-
formation on the syntactic structure.
A local grammar (Kamp, 1981) is an automaton
representation of the linguistic structures witch is dif-
ficult to formalize in lexicon-grammar tables or nu-
meric dictionaries. The local grammars, represented
in the forms of graphs, describe elements which con-
cern the same syntactic or semantic field. The linguis-
tic descriptions grouped together in the form of local
grammars are used for a large variety of automatic
processes applied to the text. Thus various methods
of lexical clarification were developed to implement
grammatical constraints described before using this
type of graph.
The corpora of text are represented by automats,
in which each state corresponds to a lexical analy-
sis. The linguistic phenomena are represented by lo-
cal grammar, and are then translated into finite state
automaton in order to be easily confronted with the
corpora of text.
Tables of lexicon-grammar are matrixes that out-
line the properties of all the simple verbs which are
described by syntactic properties. Each word having
almost unique behaviour, the tables give the grammar
of each element of the lexicon, which is why they
are called lexicon-grammar tables. With Unitex we
can build grammar from such tables. The lexicon-
grammar is a systematic description of the syntactic
and semantic properties of the syntactic factors that
is predicative verbs, nouns and adjectives. It is orga-
nized in groups of tables, which are associated with
the syntactic category like full verbs, verbs supports,
names, etc... A table corresponds to a particular syn-
tactic construction and gathers all the words enter-
ing this construction. Currently lexicon-grammar is
especially developed for the verbs and the predica-
tive phrases (Tarveen and Hill, 2001) (Turney and
Littman, 2003).
3 OVERVIEW OF GENERAL
APPROACH
Our system has modular architecture. The principle
tasks are: collecting the reviews from Internet, check-
ing if the text found is a review, assigning a mark to
the reviews and presentation of results. This paper
is focused on the marking critic’s module and more
precisely of linguistic method of classifying the re-
views. We developed three different methods for as-
TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS - Extraction and
Linguistic Analysis of Sentiments
219