is a generalization of a unigram tagger whose context
is the current word together with the part-of-speech
tags of the n − 1 preceding tokens. In this case, the
training step saves, for each possible tag, the number
of times it appears in every different context presented
on the training corpus.
Since the surrounding words can also have various
possibilities of classification, it is necessary to use a
statistical model that allows the selection of the best
choices for marking the entire sequence, according to
the model. These stochastic taggers, usually based
on hidden markov models, neither require knowledge
of the rules of the language, nor try to deduce them.
Therefore they can be applied to texts in any lan-
guage, provided they can be first trained on a corpus
for that language.
Other type of taggers are rule-based systems, that
apply language rules to improve the tagging’s accu-
racy. The first approaches in this category were based
on rules designed by human linguistic experts. There
are also attempts to automatically deduce those rules,
with perhaps the most successful one being the Brill
Tagger (Brill, 1995). The Brill’s system automatic ex-
tract rules from a training corpus, and applies them in
a iterative way in order to improve the tagging of the
text. The results presented by Brill on the Wall Street
Journal data set, with a closed vocabularyassumption,
(97.2%) are among the bests results obtained so far in
this task. Brill’s rules are called transformation rules,
and they allow to consider not only the tags that pre-
cede one particular word, like the traditional proba-
bilistic taggers, but also the tags of the words that fol-
low it.
Brill conduced experiments with two types of
transformation rules: nonlexicalized transformation
rules, which contemplate only the tags that surround
one particular word, and lexicalized transformation
rules, which consider the words itselves.
Considering Brill’s work, it seams that a model
based on rules can be more flexible, since it allows
to consider not only the tags that precede but also
the tags that follow one particular word. Information
about the words itselves can also be used. Moreover,
the format of the information collected, in the form of
rules, is easier to analyze than a extreme high number
of probabilistic values.
More recently, several evolutionary approaches
have been proposed to solve the tagging problem.
These approaches can also be divided by the type
of information used to solve the problem, statistical
information (Araujo, 2002; Araujo, 2004; Araujo,
2006; Araujo, 2007; Araujo et al., 2004; Alba et al.,
2006), and rule-based information (Wilson and Hey-
wood, 2005). Shortly, in the former, an evolutionary
algorithm is used to assign the most likely tag to each
word of a sentence, based on a context table, that basi-
cally has the same information that is used in the tra-
ditional probabilistic approaches. Notwithstanding,
there is an important difference related with the con-
text’s shape, i.e they also take into account context in-
formation about the tags that follow a particular word.
On the other hand, the later is inspired by the
Brill’s tagger. In this case a genetic algorithm (GA)
is used to evolve a set of transformations rules, that
will be used to tag a text in much the same way as the
Brill’s tagger. While in Araujo’s work the evolution-
ary algorithm is used to discover the best sequence of
tags for the words of a sentence, using an informa-
tion model based on statistical data, in Wilson’s work
the evolutionary algorithm is used to evolve the infor-
mation model, in the form of a set of transformation
rules, that will be used to tag the words of a sentence.
There are also some other aspects that can be used
to determine a word’s category beside it’s context in
a sentence (Steven Bird and Loper, 2009). In fact, the
internal structure of a word may give useful clues as
to the word’s class. For example, -ness is a suffix that
combines with an adjective to produce a noun, e.g.,
happy → happiness, ill → illness. Therefor, if we en-
counter a word that ends in -ness, it is very likely to
be a noun. Similarly, -ing is a suffix that is most com-
monly associated with gerunds, like walking, talking,
thinking, listening. We also might guess that any word
ending in -ed is the past participle of a verb, and any
word ending with ’s is a possessive noun.
In this work we investigate the possibility of us-
ing an evolutionary algorithm to evolve a set of dis-
ambiguation rules, that contemplate not only con-
text information, but also some information about the
word’s morphology. This rules are not transformation
rules like Brill’s or Wilson’s rules, but a form of clas-
sification rules, which try to generalize the context
information that is used in probabilistic taggers. We
look at the problem as a classification problem, where
the classes are the different part-of-speeches, and the
predictive attributes are the context information, and
some aspects about the words’ internal structure. Our
goal is to achieve a model that captures both of the
advantages of statistical and rule based systems.
The tagging itself is also made by a second evolu-
tionary algorithm, that uses the disambiguation rules
to find the most likely sequence of tags for the words
of a sentence. So, our system is composed by two
steps. First, a set of disambiguation rules are discov-
ered by an evolutionary algorithm, and than an evolu-
tionary tagger is used to tag the words of a sentence,
using the rules found in the first step.
The rest of the paper is organized as follows:
IJCCI2012-InternationalJointConferenceonComputationalIntelligence
6