appearance. Classification algorithms depending on
string matching or pattern matching will therefore see
their use limited in this scenario.
Contextual Approach. Not only the term itself,
but its context, too, has proven to be a highly valu-
able source of information. According to (Miller
and Charles, 1991), the exchangeability of two terms
within a given context correlates to their semantic
similarity. This means, the easier two terms are ex-
changeable within the contexts they occur, the more
likely they share a similar meaning. A statistical anal-
ysis of two term’s context composition can therefore
indicate their degree of semantic similarity.
Many approaches utilize the information con-
tained within a term’s context: (Gauch et al., 1999)
propose an automatic query expansion approach
based on information from term co-occurrence data.
(Billhardt et al., 2002) analyze term co-occurrence
data to estimate relationships and dependencies be-
tween terms. (Sch¨utze, 1992) uses this information
to create Context Vectors in a high-dimensional vec-
tor space to resolve polysemy. Apparently it is possi-
ble to gain information about a term by analyzing its
context. The following example illustrates the idea of
information extraction from a term’s context:
Example. Imagine yourself passing by a group of
people and overhearing a piece of conversation: ”To-
morrow I am going to fly to ...”
Even though this sentence is not complete, it con-
tains enough information for us to expect the miss-
ing word to be a place. In a conversation we would
intuitively request the missing information by asking
”Sorry, where are you going to?” and thereby express
our expectation of a place. We classified the miss-
ing piece of information as place just by its context.
We expect the missing word to be a place, but our ex-
pectation ist not restricted to a specific place at all.
This sentence would make perfect sense with a lot of
terms, as long as they are instances of the class place:
Tomorrow I am going to fly to Berlin. Tomorrow I am
going to fly to London.
Conclusion. Consider two terms s and t as instances
of class x. If s and t are exchangeable within a con-
text c, then this context requires its related term to
be of class x, regardless of its particular instantiation.
(Miller and Charles, 1991) stated that semantic simi-
larity correlates to contextual similarity.
Using the information contained in a given term’s
context allows two actions:
Deduction of Knowledge. Given the above example
we expect the missing piece of information to be a
place. If the speaker now replies with a word we have
never heard so far, we would assume it to be a to us
unknown place. That means, we classified a so far
unknown term utilizing only the information within
its context and acquired new knowledge.
Verification of Knowledge. If on the other hand the
speaker replies with a term which, as far as we know,
is not a place, we encounter a clash of knowledge:
Maybe our data is correct and the speaker provided
false information, maybe it’s just contrary. In either
case an erroneous piece of information would have
been detected just by its context.
Resolving Polysemy. This is a special case of the
before mentioned clash of knowledge. We might for
example know for a fact, that a crane is a bird, but we
could discover, that depending on its context this term
could refer to a type of construction equipment, too.
We can suspect a term to be an instance of a
certain class after evaluating its context, because as
speakers of that particular language we understand the
underlying rules of forming a sentence. With those
rules in mind we can conclude, that only a few classes
of terms would make actual sense in a given context.
Obviously, it is challenging to teach a computer to
perform the same conclusions. Even with a sophis-
ticated understanding of how to form a sentence in a
given language, terms still have to be recognized in
the first place, which brings us back to the recogni-
tion problems string and pattern matching algorithms
can encounter (see page 1).
Classification by Context. The contextual informa-
tion allows a transfer of knowledge to so far unknown
words: If you can identify a context c, which demands
its related term to be of class x, you could propose that
whenever you happen to find another occurrence of c
within a source, its related term is an instance of class
x, too. This leads to the following working assump-
tion:
Working assumption. A classification algorithm can
decide whether a given term is an instance of a class x
(e.g. x = place) by evaluating the context similarity.
Statistical Context Analysis. Given an arbitrary
source s, let n be the amount of terms within s.
(Sch¨utze, 1992) introduces a high-dimensional vec-
tor space with n dimensions, one for each term in
s. For any term t, its context can then be repre-
sented as a vector within this vector space, each di-
mension d (which is a term, too) displaying the num-
ber t and d co-occurred throughout the source. The
cosine angle (Baeza-Yates and Ribeiro-Neto, 1999)
between two Context Vectors within this vector space
measures the similarity of its terms co-occurrence-
KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval
388