condition monitoring using sensors and probes. Dis-
covery of common repair sequences would also assist
in improving the repair process itself. (Exclusive Ore
Inc., 2003). Suppose that product B regularly fails af-
ter failure of product A. Then the servicing company
can check the status of B each time after A fails. Also,
if it is concluded that the reason of B’s failure is the
failure of A, while repairing B, the parts of it which
has connection with A could have higher probability
of malfunction and can be checked first.
Another usage of data mining, is prediction of cus-
tomer leaving. There are two different models regard-
ing this topic. One is to predict if the customer will
leave in certain amount of time and another is to pre-
dict, for how long will customer stay with the com-
pany. Data mining can also be applied to estimate
customers’ prospective value, a revenue that customer
will bring during his remaining lifetime. This can
be used in identifying “good” and “bad” customers
(Berry and Linoff, 2004). Both models of attrition
prediction and other applications of data mining can
assist in corporate planning. Predicting amount of
failures for automobiles has successfully been applied
at DaimlerChrysler (see section 2.1). Same methods
could be applied on general installed base data, to pre-
dict failure rates of various products for financial, lo-
gistical and other planning.
Data quality control is another important task, par-
ticularly when managing a large installed base data
warehouse with many various information sources.
With the help of data mining, it is possible to improve
automation of this task. Two possible approaches in
this case are discovery of numerical outliers and iden-
tification of cases that do not fall into general patterns
identified by other data mining techniques. Identifi-
cation of cases that do not fall into general patterns
can also lead to discovery of holes in data. Discover-
ing and treating outliers is an important preprocessing
task before applying other data mining algorithms.
3 DATA MINING ON ABB’S
INSTALLED BASE
ABB’s ServIS is based on a data warehouse which
stores data in Oracle relational database. This
database consists of more than 200 tables with in-
stalled base and administrative data. Information on
the ABB products, materials, customers, services etc.
which is included in the system, can be used as a ba-
sis for data mining. Amount of available data was
quite large, encompassing about 30,000 products or
700,000 equipment units. Nevertheless, its quality
was not always on desired level for the data mining
purposes. These are missing data such as NULLs,
“other” or “unknown” entries, as well as scarcity of
historical data and lack of information that could be
useful for data mining purposes.
As the ServIS is based on Oracle 10g database,
the first choice of software for data mining was Ora-
cle Data Mining (ODM) (Oracle Corporation, 2010),
which is an built-in option of the database, with Ora-
cle Data Miner GUI. Additionally free Weka (Witten
and Frank, 1999) software was chosen as an alterna-
tive, to estimate results from ODM. Following sec-
tions present information on tested data mining mod-
els on ABB’s installed base information.
3.1 Association Rules on ABB Products
This model aims to discover ABB products that tend
to be on the same site - plant, factory, etc. An imple-
mentation of association rules discovery is done using
Apriori algorithm (Agrawal and Srikant, 1994). Simi-
lar method is also used by Amazon to find suggestions
for the customers (Linden et al., 2003) This algorithm
is available in most data mining packages including
ODM.
After constructing a data table, algorithm was ex-
ecuted with different generalization levels of prod-
ucts. To find optimal level, as suggested by Berry
and Linoff (Berry and Linoff, 2004), we used a dy-
namic approach, considering the item of the most spe-
cific generalization level and comparing amount of
item’s occurrences to heuristically identified thresh-
old value. If this amount is less than threshold, we
considered the next less specific generalization level,
continuing the process until the generalization level
which satisfies the threshold condition is found. Ex-
perimental threshold values of 50, 100, 200, 400 were
used, based on the number of occurrences in the mid-
dle generalization level, with the maximum of 506
and average of 23. 10% confidence and 1% support
thresholds were used for models.
We got at most 14 rules as a result of a single
model. This was dynamic generalization model with
200 occurrences threshold. This performance can be
explained by a specific nature of ABB. Even while
the number of products is quite large, 22,000 com-
pared with an average of 50,000 products in large su-
permarket (Nestle, 2002), ABB has lesser diversity of
products and fewer amount of customers when com-
pared to a supermarket. Customers of ABB that come
from the same industries tend to purchase similar mix
of products. Combining results from all models, we
get 9 rules with confidence levels higher than 75%
with various support levels ranging from 1% to 3%.
This method is easily reproducible and reusable.
DATA MINING ON THE INSTALLED BASE INFORMATION - Possibilities and Implementations
651