TREND ANALYSIS BASED ON EXPLORATIVE

DATA AND TEXT MINING

A Decision Support System for the European Home Textile Industry

Andreas Becks and Jessica Huster

Fraunhofer-Institute for Applied Information Technology FIT, Schloss Birlinghoven, 53754 Sankt Augustin, Germany

Keywords: Text mining, data mining, association analysis, concept-drifts, ontology-based knowledge-flow system.

Abstract: Trend-related industries like the European home-textile industry have to quickly adapt to evolving product

trends and consumer behaviour in order to avoid economic risks generated by misproduction. Trend

indicators are manifold, reaching from changes in ordered products and consumer behaviour to ideas and

concepts published in magazines or presented at trade fairs. In this paper we report on the overall design of

the Trend Analyser, a decision support system that helps designers and product developers of textile

producers to perform market basket analyses as well as mining trend-relevant fashion magazines and other

publications by trend-setters. Our tool design brings together explorative text and data mining methods in an

ontology-based knowledge flow system, helping decision-makers to perform a better planning of their

production.

1 INTRODUCTION

Trend-related industries like the European home-

textile industry face a severe economic risk: While

preferences and consuming behaviour of consumers

do change very quickly, producers have to flexibly

adjust to these trends. If producers misinterpret or

even overlook trends, their production planning will

be faulty and as a consequence non-marketable

products will stick to the stocks while on the other

hand existing market potentials cannot be leveraged.

The situation is complicated by the fact that players

in this industry do only communicate with their

direct customers but do not have a common

knowledge base of product and ordering data, or

consumer preferences.

The European project AsIsKnown (Valtinat,

2006) creates such a knowledge base for the

European home textile industry and implements a

couple of services to support cross-sector knowledge

flow and trend detection. Industry partners,

expecting an added value for the whole sector, are

ready to exchange their product and ordering data

for additional services they get in return. From a

methodical perspective, AsIsKnown develops an

ontology-based decision support system with text

and data mining tools as core functionalities.

In this paper we report on the overall design of

AsIsKnown’s Trend Analyser. This expert module

analyses ordered products, consumer behaviour and

further trend indicators in the home textile industry,

helping the industry to detect current and future

trends by utilizing explorative text and data mining

methods.

The underlying research question is: How should

a decision support system that helps knowledge

workers in creative application domains to

effectively identify future trends be designed?

Therefore, we combine field-tested as well as novel

methods of interactive data analysis and adapt them

to the requirements of product designers and

marketing specialists. The tools designed in this

phase are then subject to an extensive field

evaluation.

Companies and especially market analysts have

to monitor particular fields for recent trends that

may impact the company. For them it is important to

detect emerging topics early and how they evolve

over time. Approaches reach from methods from

traditional information retrieval to classical machine

learning (Kontostathis, 2003, 2004). Designers in the

textile sector need not only to know about upcoming

main topics but about materials and colours and in

which contexts they are mentioned. Our proposed

system falls into the semi-automatic category of

253

Becks A. and Huster J. (2007).

TREND ANALYSIS BASED ON EXPLORATIVE DATA AND TEXT MINING - A Decision Support System for the European Home Textile Industry.

In Proceedings of the Ninth International Conference on Enterprise Information Systems - AIDSS, pages 253-258

DOI: 10.5220/0002379602530258

 SciTePress

systems for emerging trend detection. The designer

will be supported in using their experience and back

ground knowledge during trend detection.

In the next sections we describe each component

of our Trend Analyser in detail and how we are

going to realise them.

2 OVERALL SYSTEM DESIGN

Having access to various trend-relevant data sources

like digitalised fashion and trend magazines,

aggregated ordering data from all producers, and

click data from computer-based product catalogues

running at the points of sales, the Trend Analyser

addresses some major shortcomings of the current

way to assess trends, i.e. it enables (a) systematic

evaluation of colour families and material groups

which are mentioned in fashion and trend

magazines, and allows (b) market basket analyses on

sales and product data of the entire industry.

The required functionality of the Trend Analyser

thus falls into two categories:

(1) A colour and material filter that helps users

to analyse the frequencies of colours, colour

families, or material types from magazines and trend

books and assess the development of colour and

material statistics over time. Particularly important is

to look for concepts like colour, material, structure

or design of surface, recognise names of architects,

designers, etc. as they appear in the magazines,

recognise new terms describing colours or surface

structures, dominant colours in magazines or articles

and their development over time,

(2) Functionalities of association mining that

help to analyse frequent combinations of materials

and colours as well as product combinations that

consumers or designers like to try out. This

combination analysis of ordering data and consumer

behaviour stored in the AsIsKnown’s data

warehouse refers to different aspects of a market

basket analysis, addressing questions like ‘What

type of customer buys what?’

These two components for analysing trend-

relevant data are accessible via the Trend Analyser

portal. The portal will give role-based access to

several classes of users. In particular designers and

marketing staff will have functionality to perform

their individual analysis.

The following sections describe the solution

design of each component in more detail,

concentrating on the novel trend-analysis

functionality. Sections 2.1 and 2.2 depict the

functionality from the users point of view whereas

section 3 describes the methods used to realise the

presented functionality.

2.1 Detecting Terminological Drifts

Starting the colour and material filter, the first thing

to do for the expert user is to set up an analysis

matrix (cf. Figure 1). The analysis matrix defines the

groups of magazines or articles and the period and

aggregation level of time for which trend-relevant

concepts, represented in a domain ontology, in the

magazines shall be analysed. This is done in three

steps:

(1) Define groups of magazines to be analysed:

Figure 1: Analysis matrix with term context stars.

ICEIS 2007 - International Conference on Enterprise Information Systems

254

The expert user selects specific magazines or articles

to analyse. Each magazine group the user defines

will correspond to one row in the analysis matrix.

(2) Define period and aggregation level of time:

The expert user specifies the period of time to be

considered (start and end date) and the aggregation

level of the time axis of the analysis matrix (e.g.

monthly, quarterly, or yearly). The grouping of

analysis results in the columns of the matrix will be

done according to the aggregation level.

(3) Define contents of the matrix cells: Finally,

the expert user defines the concepts he wants to

analyse. That is, he asks for the colours, materials,

certain architects, etc. mentioned in the magazines.

Given a set of target concepts the expert user has

selected from the ontology, the Trend Analyser will

compute a set of term context stars, one for each

concept in each cell of the analysis matrix.

A term context star (cf. cell content of matrix in

Figure 1) is a graphical representation of a concept

and terms that appear in the context of this concept

in the considered magazines.

• The context is determined by grammatical rules

and refers to adjectives, nouns or other

components of phrases surrounding the concept.

Whereas the concepts are given, contextual

terms are automatically extracted from the

magazines’ articles by applying the grammatical

rules. In the following, these contextual terms

are called attributes of the concept.

• Term context stars are computed for each cell of

the analysis matrix, i.e. the considered magazine

articles are defined by a group of magazines and

a certain period of time.

• In the graphical representation the concept itself

appears in the centre of the term context star.

Attributes of the concept surround the concept

in form of bubbles. The relative size of each

attribute bubble corresponds to the number of

times that attribute appears with the considered

concept.

• The size of term context stars themselves is

relative to the number of articles in the

considered magazines (defined for the

corresponding cell in the analysis matrix) in

which the concept appears.

Looking at the term context stars presented in

Figure 1: Assume that in the considered magazines

the Trend Analyser has analysed the context of the

concepts “blue”, “pink”, “green” , “floor”, “brick”,

“glass”, and “pine” (as given by the user). In the

considered magazines, the concept “blue” frequently

appears in the contexts “pigeon blue”, a bit less

frequent as “baby blue” and “light blue” and still

some times as “Blue One” (a trade name) or “blue

world”. Looking at all concepts, “brick” did occur

most frequently, followed by “floor” and “blue”,

while there are still some occurrences of “pine”,

”green”, “floor “, “glass” and ”pink”.

Given a visualisation of term context stars in the

Trend Analyser, the expert user can click on

concepts or attributes to show all the articles in the

considered magazines that contain the respective

concept and attributes (highlighted in the articles).

The overall result of the analysis is shown in the

analysis matrix. Figure 1 gives an example of a

possible analysis result. It shows, for instance, that

“terracotta bricks” started up in 2003 in the

“Abitare” magazine (which in fact is a product

catalogue), became more and more popular until

2005 – and disappeared the year after.

2.2 Mining Association Rules

The second component, i.e. the association mining

tool of the Trend Analyser, presents the selected

attributes in form of a table similar to tables in

relational data bases. Attributes such as order

number, product category as well as further

properties of a product are presented in rows. The

values of each attribute are listed in the columns.

To derive a set of correlations that may give

answers to questions like: “Which type of customer

Figure 2: Compressed view on relevant attributes for analysis of sample retail sales ordering data.

TREND ANALYSIS BASED ON EXPLORATIVE DATA AND TEXT MINING - A Decision Support System for the

European Home Textile Industry

255

buys what?”, “What kind of products are bought

together” or “What are customers looking at before

they decide to buy a certain product?” the

association mining tool provides flexible interaction:

• Different visualised views on the data help to

gain an overview and detect dependencies on

the one hand or go into detail and focus on

certain attribute values on the other hand.

• Defining different functions over attributes.

The expert may for example compute the sum

or determine the average of all values of one

specific attribute.

The expert can gain an overview of the attributes

and their value distribution by using the so called

“compressed view” of the tool (cf. Figure 2). This

view causes that adjacent cells with the same value,

namely the attribute values presented in columns,

are combined. The width of each cell indicates the

number of objects with this specific value. Cells

with numeric values, too small to be labelled with

the related value, are represented through a

horizontal line. The level of that line reflects the

height of the value. At a glance one can see that

three shipping companies deliver the products (see

row “shipping company name”). The row “country”

shows that each of these companies deliver to

customers in Germany and USA. In that way the

expert may detect interesting attributes/correlations

which are worth looking at in detail.

In the following we give an example on how to

detect an association such as “Which two product

categories are combined most frequently in one

order?” by performing the two working steps of

defining additional functions and focusing on certain

attribute values.

In the first step the user defines a function to

determine the number of product categories in one

order. Since he is interested in the combination of

two product categories, he then focuses (namely

double clicks on the value) on orders where products

of two categories are combined. In that way all

orders are selected where products from two

different categories are combined. Visually the

selected part grows until this cell fits the width of

the screen. At the same time the value distributions

of the other attributes adapt visually to that selection.

To experience which two product categories are

combined, the expert introduces a new function. The

result is presented, when he finally sorts the values

according to this new function (cf. Figure 3).

After detecting such an association the expert

will still have to verify that this association

represents a meaningful causal relationship, since

associations do not imply causation.

3 METHODS AND TOOLS

In this section we present methods and tools that are

suitable to realise the functionality of the Trend

Analyser as depicted in the solution design (section

2). In order to realise the text analysis features of

the Trend Analyser we have selected and developed

the following set of methods and tools (cf. 3.1).

Section 3.2 concentrates on the appropriate

information visualisation system we selected for

association mining functionality.

3.1 Visual Text Mining

The idea of term context stars (cf. Figure 1)

distinguishes between terms and concepts. Concepts

have a direct connection with AsIsKnown’s domain

ontology which constitutes relatively stable

knowledge of the domain. Terminology trends in

fast moving industries, in contrast, are rather

dynamic, a priori unknown, and evolving from the

active use of these terms, concept combinations and

expressions. From the viewpoint of knowledge

engineering, such concept drifts thus cannot be

modelled in advance. On contrary, it is rather

interesting to detect the terminological development

and to match it with known concept models. Thus,

we have decided to model just the relatively stable

“anchors”, i.e. concepts like basic colours, materials,

or structures. Dynamic, fluctuating terminology is

then rather detected by text mining technology.

To do that, we use methods of shallow natural

Figure 3: Dairy products and beverages are bought most frequently in one order.

ICEIS 2007 - International Conference on Enterprise Information Systems

256

language processing. Magazines are first

linguistically pre-processed: The tokenized texts are

automatically annotated with part-of-speech tags that

indicate the grammatical categories of each word.

Using dictionaries, for each known term a matching

concept from the ontology is attached (word sense

tagging). These linguistic services are realised with

the CLaRK system (Simov, 2004).

Given a list of concepts that shall be examined in

the magazines, target fragments of texts are first

identified with help of the word sense tags, e.g. all

sentences containing the concept “brick”. We then

use partial grammars that describe the possible

positions of interesting terms in the context of a

concept we are interested in, e.g. all adjectives that

are related to the concept. Terms that match these

grammar rules are extracted from the texts.

The final task is to visualise the extraction

results. Modelling and visualising term distributions

and term contexts has attracted interest in research

fields such as information retrieval (Becks, 2001),

linguistics, and web-based communities. Heringer

(1998) has introduced a technique where lexical

fields are automatically computed by a context

analysis of certain keywords. A degree of affinity is

determined by measuring the contextual ‘closeness’

of terms to the keyword. The resulting lexical fields

can be graphically presented as stars where the

context words are circularly arranged around the

concept. The distance of each satellite to the concept

reflects the degree of affinity.

While Heringer’s idea focuses on the notion of

term affinity, another recent approach tackles the

issue of term frequency: In the Web 2.0 community

the concept of tag clouds has become popular. Tag

clouds (also known as word clouds) visualise the

frequency of tags that appear on a website (Hassan-

Montero, 2006). More frequently used tags are

emphasised by larger fonts or other ways of

graphical highlighting.

The notion of term context stars is basically a

mixture of Heringer’s star visualisation idea for

lexical fields and the keyword-scaling of tag clouds.

Its visualisation metaphor helps users to recognise

dominant concepts as well as term attributes in a text

corpus. Moreover, different term context stars of the

same concept can easily be compared regarding

frequencies of concepts and drift of term attributes

(cf. Figure 1). The visualisation functionality can be

implemented using standard graphical programming

libraries. Complex layout algorithms (spring

embedding or other graph drawing techniques) are

not necessary.

3.2 Visual Association Mining

Association Mining is a method to discover which

items co-occur frequently within a data-set. A

typical example is the market basket analysis. In this

process customer buying habits are analysed by

finding associations between different items that

customers place in their “shopping baskets” (Han,

2001).

Association rules are implications of the form X ⇒

Y, i.e. A

∧…∧A

Æ B

∧…∧B

where A

( i ∈ {1,

…, m}) and B

(j ∈ {1, …, m}) are attribute-value

pairs. The rule is interpreted as “database tuples

which satisfy the condition X are also likely to

satisfy the condition Y”.

If a producer, for instance, would like to

determine which products are likely to be purchased

together, the appropriate rule would be like the

following: buys(customer, “sofa”) ⇒

buys(customer, “easy chair”).

Such associations, once found, can help the

producers understand their customers and as a result

help them to develop appropriate marketing

strategies and cross selling methods.

Many data mining tools support the task of

finding association rules within a given data set by

searching for correlations in the data automatically.

They test a lot more combinations of attributes than

the expert user can do manually. However it is

important to explore and understand the data being

analysed since this is the first step before one is able

to ask the right questions and any data mining

method can be applied in an appropriate way. In

particular, it is often necessary to define the right

derived attributes before the data mining method can

be applied in an appropriate way.

The information need from the producers in the

AsIsKnown context is driven by the wish to better

understand their consumer’s behaviour since they

have to be able to react to new trends and plan their

production according to these trends. They cannot

specify precisely where to find the information and

which attributes have to be analysed to lead a search.

Holten (1997), who addresses the question of

adequate system support for unstructured decisions,

states that these kinds of problems require rather

data-driven information analysis processes. Hence

he proposes exploration-oriented interaction

strategies. InfoZoom (Spenke, 2000), the tool we use

(cf. example in section 2.2), is a flexible visual data

mining tool, for individual and ad hoc analysis of

huge data amounts. It combines the required

functionality on the one hand with the flexibility

necessary for the domain experts on the other hand.

TREND ANALYSIS BASED ON EXPLORATIVE DATA AND TEXT MINING - A Decision Support System for the

European Home Textile Industry

257

InfoZoom provides the user with individual

views on the data values and different interaction

possibilities. Depending on the working context and

the arising questions the user can access the relevant

data and perform individual analysis. The user can

look at the whole data at a glance as well as

exploring a specific part of the data in detail.

In this way, the user gets a feeling for the data,

detects interesting knowledge, and gains a deep

understanding of the data set. The user can access

the data in that way and depth as it is necessary and

required for his working context. Animated zoom

into interesting areas of the data table as well as the

possibility to define functions support the user in her

task. For example the user can compute the sum, or

derive attributes such as the maximum or the

average. On that way the user derives new and

important information for further work.

Correlations can be detected by sorting according

to different attributes and by zooming into

interesting areas of the table.

4 CONCLUSION AND OUTLOOK

The development of the Trend Analyser is an

ongoing work in the AsIsKnown project. A mock-up

of the Trend Analyser has already been evaluated in

a concept review workshop with designers, product

managers and marketing staff of a carpeting

producer. We collected informal feedback on the

system design and received assessments on the

expected usefulness of the tool. It turned out that the

explorative approach with a high degree of user

interaction is expected to establish trust in the

mining results and help users to derive ideas of

potential trend lines, taking into account also their

high degree of experience and implicit background

knowledge. A purely automatic computing of

mining results, on the contrary, would not be

accepted by this particular user group which is used

to a rather creative and weakly structured way of

working.

Of course, a more formal evaluation is still

necessary. This will be done based on a first

prototype of the Trend Analyser which is planned to

be used in a field study with a textile producer. We

will use observational methods and structured

interviews to assess the functional design, usability

and impact of the Trend Analyser.

ACKNOWLEDGEMENTS

AslsKnown (http://www.asisknown.org) is funded

within the Information Society Technologies (IST)

Priority of the Sixth Framework Programme (FP6)

of the European Commission.

REFERENCES

Becks, Andreas (2001). Visual knowledge management

with adaptable document maps GMD research series,

no.15

Han, Jiwei. Micheline Kamber (2001). Data mining.

Concepts and techniques, Morgan Kaufmann

Publishers Inc.

Hassan-Montero, Yusef, Victor Herrero-Solana (2006).

Improving Tag-Clouds as Visual Information

Retrieval Interfaces. Int. Conf. on Multidisciplinary

Information Sciences and Technologies, InSciT2006,

Mérida, Spain

Heringer, Hans Jürgen (1998). Das Höchste der Gefühle –

Empirische Studien zur distributiven Semantik. Verlag

Stauffenburg, Tübingen

Holten, R. (1997) Die drei Dimensionen des

Inhaltsaspektes von Führungsinformations-systemen,

Arbeitsberichte d. Inst. für Wirtschaftsinformatik,

Universität Münster, April.

Kontostathis, April, Leon M. Galitsky, William M.

Pottenger, Soma Roy, Daniel J. Phelps (2003) A

survey of Emerging Trend Detection in Textual Data

Mining, Survey of Text Mining, pp.185-224

Kontostathis, April, Lars E. Holzman, William M.

Pottenger (2004) Use of Term Clusters for Emerging

Trend Detection, Preprint

Simov, Kiril, Alexander Simov, Hristo Ganev, Krasimira

Ivanova, Ilko Grigorov (2004). The CLaRK System:

XML-based Corpora Development System for Rapid

Prototyping. In: Proceedings of LREC, Lisbon,

Portugal, pp. 235-238

Spenke, Michael, Christian Beilken (2000). InfoZoom –

Analysing Formula One racing results with an

interactive data mining and visualisation tool. Second

International Conference on Data Mining, 5-7 July,

Cambridge University, United Kingdom

Valtinat, Tobias, Wolfgang Backhaus, Klaus Henning

(2006). Non Invasive, Cross-Sector Development and

Management of Trends. Leading the Web in

Concurrent Engineering, P. Ghodous et al. (Eds.), IOS

Press

ICEIS 2007 - International Conference on Enterprise Information Systems

258