language processing. Magazines are first
linguistically pre-processed: The tokenized texts are
automatically annotated with part-of-speech tags that
indicate the grammatical categories of each word.
Using dictionaries, for each known term a matching
concept from the ontology is attached (word sense
tagging). These linguistic services are realised with
the CLaRK system (Simov, 2004).
Given a list of concepts that shall be examined in
the magazines, target fragments of texts are first
identified with help of the word sense tags, e.g. all
sentences containing the concept “brick”. We then
use partial grammars that describe the possible
positions of interesting terms in the context of a
concept we are interested in, e.g. all adjectives that
are related to the concept. Terms that match these
grammar rules are extracted from the texts.
The final task is to visualise the extraction
results. Modelling and visualising term distributions
and term contexts has attracted interest in research
fields such as information retrieval (Becks, 2001),
linguistics, and web-based communities. Heringer
(1998) has introduced a technique where lexical
fields are automatically computed by a context
analysis of certain keywords. A degree of affinity is
determined by measuring the contextual ‘closeness’
of terms to the keyword. The resulting lexical fields
can be graphically presented as stars where the
context words are circularly arranged around the
concept. The distance of each satellite to the concept
reflects the degree of affinity.
While Heringer’s idea focuses on the notion of
term affinity, another recent approach tackles the
issue of term frequency: In the Web 2.0 community
the concept of tag clouds has become popular. Tag
clouds (also known as word clouds) visualise the
frequency of tags that appear on a website (Hassan-
Montero, 2006). More frequently used tags are
emphasised by larger fonts or other ways of
graphical highlighting.
The notion of term context stars is basically a
mixture of Heringer’s star visualisation idea for
lexical fields and the keyword-scaling of tag clouds.
Its visualisation metaphor helps users to recognise
dominant concepts as well as term attributes in a text
corpus. Moreover, different term context stars of the
same concept can easily be compared regarding
frequencies of concepts and drift of term attributes
(cf. Figure 1). The visualisation functionality can be
implemented using standard graphical programming
libraries. Complex layout algorithms (spring
embedding or other graph drawing techniques) are
not necessary.
3.2 Visual Association Mining
Association Mining is a method to discover which
items co-occur frequently within a data-set. A
typical example is the market basket analysis. In this
process customer buying habits are analysed by
finding associations between different items that
customers place in their “shopping baskets” (Han,
2001).
Association rules are implications of the form X ⇒
Y, i.e. A
1
∧…∧A
m
Æ B
1
∧…∧B
n
where A
i
( i ∈ {1,
…, m}) and B
j
(j ∈ {1, …, m}) are attribute-value
pairs. The rule is interpreted as “database tuples
which satisfy the condition X are also likely to
satisfy the condition Y”.
If a producer, for instance, would like to
determine which products are likely to be purchased
together, the appropriate rule would be like the
following: buys(customer, “sofa”) ⇒
buys(customer, “easy chair”).
Such associations, once found, can help the
producers understand their customers and as a result
help them to develop appropriate marketing
strategies and cross selling methods.
Many data mining tools support the task of
finding association rules within a given data set by
searching for correlations in the data automatically.
They test a lot more combinations of attributes than
the expert user can do manually. However it is
important to explore and understand the data being
analysed since this is the first step before one is able
to ask the right questions and any data mining
method can be applied in an appropriate way. In
particular, it is often necessary to define the right
derived attributes before the data mining method can
be applied in an appropriate way.
The information need from the producers in the
AsIsKnown context is driven by the wish to better
understand their consumer’s behaviour since they
have to be able to react to new trends and plan their
production according to these trends. They cannot
specify precisely where to find the information and
which attributes have to be analysed to lead a search.
Holten (1997), who addresses the question of
adequate system support for unstructured decisions,
states that these kinds of problems require rather
data-driven information analysis processes. Hence
he proposes exploration-oriented interaction
strategies. InfoZoom (Spenke, 2000), the tool we use
(cf. example in section 2.2), is a flexible visual data
mining tool, for individual and ad hoc analysis of
huge data amounts. It combines the required
functionality on the one hand with the flexibility
necessary for the domain experts on the other hand.
TREND ANALYSIS BASED ON EXPLORATIVE DATA AND TEXT MINING - A Decision Support System for the
European Home Textile Industry
257