OPTIMIZED STRATEGIES FOR ARCHIVING

MULTI-DIMENSIONAL PROCESS DATA

Building a Fault-diagnosis Database

Sebastian Feller

1

, Yavor Todorov

1

, Dirk Pauli

1

and Folker Beck

2

1

FCE Frankfurt Consulting Engineers GmbH, Frankfurter Strasse 5, 65239 Hochheim/Main, Germany

2

John Deere Werke Zweibruecken, Homburger Strasse 117, 66482 Zweibruecken, Germany

Keywords:

Data compression, Time series analysis, Condition based maintenance.

Abstract:

In many real-world applications such as condition monitoring of technical facilities or vehicles the amount

of data to process and analyze has steadily increased during the last decades. In this paper a novel approach

to data compression is presented, namely the multivariate representative of the Perceptually Important Points

algorithm. Furthermore, approaches are given on how multivariate data should be dealt with to preserve all

relevant multivariate information during a lossy data compression. This involves an extensive analysis of the

stochastic dependencies of the process data. On the one hand the presented algorithm is able to compress the

multivariate time series and on the other hand the algorithm can be easily extended to reﬂect a model of the

original time series. It is shown that suggested multivariate compression algorithm outperforms its univariate

equivalent.

1 INTRODUCTION

The digitalization of sensor equipment and the inte-

gration of these sensors into communication networks

have immensely increased the amount of data avail-

able for various kinds of processes. For utility compa-

nies these large data streams are of particular concern.

For example, as a regulated industry, the power indus-

tries in most western countries are obliged to collect

extensive information on their power production pro-

cesses. Emissions and a diverse number of thermo-

dynamical and mechanical process variables are usu-

ally collected at a rate of one value set per second.

Since modern power plants can have value sets with

over 5000 different readings, a direct evaluation of

this incoming data ﬂood is not possible, and data com-

pression and organization methods become urgent. In

general data compression is of interest, if the quantity

of collected data is too large for given performance in

terms of processing time and storage.

This paper gives a summary of state-of-the-art al-

gorithms used in modern data historians in section

2 and demonstrates their shortcomings considering

the requirements of typical condition monitoring soft-

ware and methods of empirical fault analysis on the

example of ’Perceptually Important Points’. In sec-

tion 4 a ﬁrst simple example is given. In the following

two section datasets with a more complex structure

are studied under the same premise. Based on these

considerations an improved procedure for multivari-

ate time series compression is suggested in section 7.

2 PROBLEM DESCRIPTION

The initial situation of how data is collected and what

it is used for varies greatly from industry to industry.

In the following the focus is put on the power produc-

ing industry since companies in this industry already

have extensive sensory equipment installed and ex-

hibit capabilities of collecting data in central nodes in

each of their power plants. This is partially due to the

aforementioned regulations enforced on these utility

companies.

A primary purpose for data collection is of course

to control the power plant based on this information.

Additionally some form of data processing and stor-

age will be used. In the most basic variation of data

processing all incoming data is stored on a digital

storage device which is archived when it is full. This

usually fulﬁlls regulation requirements, but valuable

information is lost.

With typical computational power steadily in-

388

Feller S., Todorov Y., Pauli D. and Beck F..

OPTIMIZED STRATEGIES FOR ARCHIVING MULTI-DIMENSIONAL PROCESS DATA - Building a Fault-diagnosis Database.

DOI: 10.5220/0003571803880393

In Proceedings of the 8th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2011), pages 388-393

ISBN: 978-989-8425-74-4

Copyright

c

2011 SCITEPRESS (Science and Technology Publications, Lda.)

creasing and becoming ever more affordable at the

same time, new plausible paths open up. The col-

lected data can be used to feed condition monitor soft-

ware (e.g. refer to (Chevalier et al., 2009)) which

is a key factor to reducing risks, as emerging dam-

ages can be detected long before they become serious

threats. These potentials are frequently recognized by

the operator of the equipment. Unfortunately the sta-

tistical and physical knowledge to process the data is

not always present in the concerning IT departments.

A variety of ’Off-the-Shelf’ data historians are avail-

able which assist the data collection process, but the

implemented algorithms are usually only designed to

quickly store away data and do not incorporate con-

cerns about advanced data evaluation techniques, e.g.

condition monitoring with anomaly detection algo-

rithms, such as Auto-Associative Kernel Regression,

refer to (Chevalier et al., 2009).

A survey of the current state-of-the-art time se-

ries data compression algorithms, such as Swinging-

Door-Compression, e.g. refer to (Fu, 2010), (Thorn-

hill et al., 2004), (Bristol, 1990), and Perceptually-

Important-Points (PIP), e.g. refer to (Fu, 2010),

(Chung et al., 2001), (Fu et al., 2001), or compres-

sion based on Fast-Fourier-Transformation, e.g. re-

fer to (Fu, 2010), (Stoffer, 1999), (Press et al., 2007),

Wavelet-Transformations, e.g. refer to (Fu, 2010),

(Chen et al., 2004), (Press et al., 2007), or Cheby-

shev polynomials (Hawkins III et al., 2003), (Eruhi-

mov et al., 2008), shows that current techniques fo-

cus on univariate compression. These approaches ne-

glect any correlations between different sensor read-

ings which lead to a suboptimal compression of the

process data as shown in the remaining paper.

3 NOVEL APPROACH ON

MULTIVARIATE DATA

COMPRESSION

The current paper concentrates on the development of

a multivariate variant of the PIP algorithm. Hence, its

univariate representative is introduced ﬁrst.

The aim of the algorithm can be summarized

as approximating historical time series T = {x

i

} by

piecewise linear functions, where i ∈ {1, ...,n} is the

time index. The result is a set P of the so called per-

ceptually important points. Figure 1 gives an example

of an approximation. Note that linear functions are

deﬁned by two sequent perceptually important points.

The algorithm is initialized with P = {x

1

,x

n

}, re-

fer to ﬁgure 1. Following, the next ’important’ point

of T/P is determined via its Euclidian distance to

‚PIP‘-Algorithm

Figure 1: Example of incrementally estimated PIP. The ﬁrst

and last element of the time series are used for initializa-

tion, following ’important’ points are determined via their

distance to the corresponding linear function.

its corresponding approximation. As shown in ﬁg-

ure 1, adding points to P changes the approximation

of T. The procedure of determining the next ’impor-

tant’ point terminates, if a criteria for convergence is

fulﬁlled. This can be a certain compression ratio or

a global mean squared error, to name a few possible

criteria.

In the following, the univariate PIP compression

algorithm is extended to its multivariate representa-

tive. This inﬂuences the estimation of the next ’im-

portant’ point as well as performance requirements or

convergence criteria. As with the univariate represen-

tative the ﬁrst and last point of the time series are used

for initialization. Figure 2 depicts a typical situation

during the approximation procedure. The points at

times 0, 1 and n already have been selected. The in-

dex τ

∗

of the next point added to the approximation is

identiﬁed via

τ

∗

= argmax

τ∈{0,...,n}

kp(τ) − p

∗

(τ)k,

where p

∗

(t) is the linear approximation of the point

p(t) given the current selection of PIPs.

4 AN INTRODUCTORY

EXAMPLE

Beginning with artiﬁcial data, statistical properties of

OPTIMIZED STRATEGIES FOR ARCHIVING MULTI-DIMENSIONAL PROCESS DATA - Building a Fault-diagnosis

Database

389

p(t=0)

x

y

t

p(t=n)

x

y

p(t=1)

x

y

p(t=t)

x

y

d

t

p*(t= )t

Figure 2: Snapshot of the approximation procedure of the

multivariate PIPs. The ﬁrst and last elements of the time se-

ries are used for initialization. The following points are de-

termined via their Euclidian distance d

τ

to the correspond-

ing linear approximation.

the univariate and multivariate compression algo-

rithms are compared. A test on artiﬁcial data is shown

in ﬁgure 3 and 4. The compression algorithm used in

this example is a univariate and multivariate percep-

tual importance algorithm, e.g. compare (Fu, 2010),

(Chung et al., 2001), and (Fu et al., 2001). Consid-

ering the two dimensional goodness of ﬁt, measured

by the mean squared error (ﬁg. 4), it can be seen

that the multivariate algorithm outperforms the uni-

variate equivalent. To understand this, one has to bear

in mind that compressing a two dimensional or any

higher dimensional time series in a univariate fashion

generally results in PIPs not being aligned in the time

domain. This can lead to an inferior compression.

t

x

y

y

t

y(t) = cos(t)

f (t) = (x,y) = (sin(t), cos(t))

T T

x

t

x(t) = sin(t)

Figure 3: Two dimensional time series consisting of a sine

and cosine dependent on a parameter t with equal phase,

frequency and amplitude. The result in two dimensional

space is a helix.

5 COMPRESSION RATIO VS.

COMPRESSION QUALITY

The analysis of the multivariate PIP algorithm is con-

tinued by considering three artiﬁcial datasets. The

artiﬁcial datasets are chosen so that the statistical

properties, especially in terms of stochastic depen-

dence, are fully established. The ﬁrst dataset con-

sists of ten independent Ornstein-Uhlenbeck pro-

cesses (Gillespie, 1996) and (Uhlenbeck and Orn-

stein, 1930) each given by the equation:

dX

t

= θ(µ− X

t

)dt + σdW

t

,

0 10 20

Number of Approximation Points

0,0

0,1

0,2

0,3

0,4

0,5

0,6

M

ean Squared Error of Approximation

Univariate Compression Multivariate Compression

Figure 4: Comparison between univariate (orange line) and

multivariate (green line) compression of the artiﬁcial data

shown in ﬁgure 3. The ﬁgure shows the mean squared ap-

proximation error versus the number of points used to ap-

proximate the helix. In this case the multivariate algorithm

outperforms the univariate equivalent.

where dW

t

is the increment of a Wiener process. The

processes are initialized by random values distributed

according to X

0

∼ N (0,σ

2

). Ornstein-Uhlenbeck

processes are chosen since they closely resemble the

dynamics of a real system (Feller, 2009). The two

additional datasets are also generated via Ornstein-

Uhlenbeck processes. In case of the second and third

dataset the processes are not independent any more.

A correlation is introduced by a system of differential

equations that drive the underlying dynamics of the

observed process and a observer equation. The sys-

tem of differential equations is in the form of

d

~

X

t

= Θ(~µ− Ψ

~

X

t

)dt + σ

X

d

~

W

t

,

where Ψ is a symmetrical matrix and Θ is a diago-

nal matrix with the elements θ

i

. For the ﬁrst dataset

Ψ is an identity matrix. For the second example Ψ

has block diagonal structure and for the third random

correlations are chosen on initialization. The observer

equation is given by

~

Y

t

= A·

~

X

t

+ σ

Y

d

~

W

t

,

where

~

Y

t

is the observed signal vector. For the ﬁrst

dataset

~

Y

t

≡

~

X

t

. For the second dataset

~

X

t

has 6 and

for the third 2 dimensions. The dimension of

~

Y

t

is al-

ways 10. The dimensions of A vary correspondingly.

Figure 5 shows an example for the non-linear correla-

tion between the parameters of each dataset type. The

correlations were calculated from one sample. In this

ﬁgure green colors depict a low correlation and red

colors stand for a high correlation.

In ﬁgure 6 the compression results for the three

datasets are shown. In the graphs the logarithm of

MSE is shown versus the number of dimensions com-

pressed and the number of PIPs selected for linear

approximation. In order to smooth out any random

ICINCO 2011 - 8th International Conference on Informatics in Control, Automation and Robotics

390

1,0 0,7 0,6 0,5 0,6 0,9 0,5 0,7 0,9

0,7 1,0 0,7 0,7 0,7 0,7 0,7 0,7 0,7

0,6 0,7 1,0 0,9 0,8 0,7 0,5 0,5 0,7

0,5 0,7 0,9 1,0 0,8 0,7 0,5 0,3 0,6

0,6 0,7 0,8 0,8 1,0 0,5 0,8 0,6 0,8

0,9 0,7 0,7 0,7 0,5 1,0 0,4 0,4 0,7

0,5 0,7 0,5 0,5 0,8 0,4 1,0 0,5 0,7

0,7 0,7 0,5 0,3 0,6 0,4 0,5 1,0 0,7

0,9 0,7 0,7 0,6 0,8 0,7 0,7 0,7 1,0

1,0 0,7 0,7 1,0 0,7 0,6 1,0 0,8 0,9

Dependent

0,7 1,0 1,0 0,5 1,0 1,0 0,8 1,0 0,9

0,7 1,0 1,0 0,4 1,0 1,0 0,7 1,0 0,9

1,0 0,5 0,4 1,0 0,5 0,3 0,9 0,6 0,8

0,7 1,0 1,0 0,5 1,0 1,0 0,8 1,0 0,9

0,6 1,0 1,0 0,3 1,0 1,0 0,6 0,9 0,8

1,0 0,8 0,7 0,9 0,8 0,6 1,0 0,8 0,9

0,8 1,0 1,0 0,6 1,0 0,9 0,8 1,0 1,0

0,9 0,9 0,9 0,8 0,9 0,8 0,9 1,0 1,0

1,0 0,4 0,1 0,4 -0,4 0,2 0,2 0,4 -0,1

Independent

0,4 1,0 0,3 0,0 -0,3 0,3 0,2 0,2 0,3

0,1 0,3 1,0 0,0 -0,5 0,0 0,3 0,2 0,0

0,4 0,0 0,0 1,0 0,0 0,2 0,1 0,4 -0,3

-0,4 -0,3 -0,5 0,0 1,0 0,1 -0,5 -0,2 0,0

0,2 0,3 0,0 0,2 0,1 1,0 0,0 -0,1 0,0

0,2 0,2 0,3 0,1 -0,5 0,0 1,0 0,1 0,2

0,4 0,2 0,2 0,4 -0,2 -0,1 0,1 1,0 -0,3

-0,1 0,3 0,0 -0,3 0,0 0,0 0,2 -0,3 1,0

1,0 0,7 0,6 0,5 0,6 0,9 0,5 0,7 0,9 SemiDependent

0,7 1,0 0,7 0,7 0,7 0,7 0,7 0,7 0,7

0,6 0,7 1,0 0,9 0,8 0,7 0,5 0,5 0,7

0,5 0,7 0,9 1,0 0,8 0,7 0,5 0,3 0,6

0,6 0,7 0,8 0,8 1,0 0,5 0,8 0,6 0,8

0,9 0,7 0,7 0,7 0,5 1,0 0,4 0,4 0,7

0,5 0,7 0,5 0,5 0,8 0,4 1,0 0,5 0,7

0,7 0,7 0,5 0,3 0,6 0,4 0,5 1,0 0,7

0,9 0,7 0,7 0,6 0,8 0,7 0,7 0,7 1,0

1,0 0,7 0,7 1,0 0,7 0,6 1,0 0,8 0,9

Dependent

0,7 1,0 1,0 0,5 1,0 1,0 0,8 1,0 0,9

0,7 1,0 1,0 0,4 1,0 1,0 0,7 1,0 0,9

1,0 0,5 0,4 1,0 0,5 0,3 0,9 0,6 0,8

0,7 1,0 1,0 0,5 1,0 1,0 0,8 1,0 0,9

0,6 1,0 1,0 0,3 1,0 1,0 0,6 0,9 0,8

1,0 0,8 0,7 0,9 0,8 0,6 1,0 0,8 0,9

0,8 1,0 1,0 0,6 1,0 0,9 0,8 1,0 1,0

0,9 0,9 0,9 0,8 0,9 0,8 0,9 1,0 1,0

1,0 0,4 0,1 0,4 0,4 0,2 0,2 0,4 0,1

Independent

0,4 1,0 0,3 0,0 0,3 0,3 0,2 0,2 0,3

0,1 0,3 1,0 0,0 0,5 0,0 0,3 0,2 0,0

0,4 0,0 0,0 1,0 0,0 0,2 0,1 0,4 0,3

0,4 0,3 0,5 0,0 1,0 0,1 0,5 0,2 0,0

0,2 0,3 0,0 0,2 0,1 1,0 0,0 0,1 0,0

0,2 0,2 0,3 0,1 0,5 0,0 1,0 0,1 0,2

0,4 0,2 0,2 0,4 0,2 0,1 0,1 1,0 0,3

0,1 0,3 0,0 0,3 0,0 0,0 0,2 0,3 1,0

Figure 5: Examples for the Spearman rank correlation for

each of the designed datasets shown in a correlation matrix.

Green colors depict a low correlation and red colors stand

for a high correlation between the parameter pairs.

effect the analysis was repeated with 100 different

datasets each.

The results for the compression of one dimension

is similar for all datasets.

From ﬁgure 6 it can be seen, that with increasing

dimensions the amount of PIPs required to achieve

the same goodness of ﬁt is exponentially larger in the

case of the independent processes.

6 CONDITION MONITORING

CASE STUDY ON TWO REAL

DATASETS

Considering datasets from real systems it is important

to keep the ﬁndings of the previous section in mind.

It is crucial only to compress multivariate datasets if

the individual signals have signiﬁcant stochastic de-

pendencies with each other. In case these stochas-

tic dependencies do not exist, it is possible to iso-

late groups of highly dependent signals. In this man-

ner a dataset can be split into a number of subset

datasets which each contain the required amount of

data. These groups of dependent signals are also very

beneﬁcial for the application of a condition monitor-

ing software (Feller and Chevalier, 2010).

The multivariate extension of the perceptually im-

portant points algorithm was applied to two case study

datasets. The ﬁrst dataset originates from a gas tur-

bine and the second comes from an agricultural ve-

hicle. The dataset from the gas turbine consists of

SemiDependent

Dependent

Independent

Figure 6: The three graphs show the logarithm of MSE ver-

sus the number of dimensions compressed and the number

of PIPs selected for linear approximation. The top most

graph results from the compression of the artiﬁcial dataset

2 which has a medium dependency between different di-

mensions. The center graph results from the third artiﬁ-

cial dataset. Finally the bottom graph results from the ﬁrst

dataset with the lowest dependency. The colored lines in

each graph depict levels of equal goodness of ﬁt. For each

graph the same scales and view were used. To smooth out

any random effects the analysis was repeated 100 times and

the results were averaged.

5000 samples, each containing 120 different param-

eters. The parameters include mechanical and ther-

modynamical variables. The dataset from the agricul-

tural vehicle contains 3750 samples, each with 130

different parameters. The parameters primarily con-

tain mechanical variables, such as vibrations. Fig-

ure 8 summarizes the compression progress for both

OPTIMIZED STRATEGIES FOR ARCHIVING MULTI-DIMENSIONAL PROCESS DATA - Building a Fault-diagnosis

Database

391

datasets. The ﬁgure shows the mean squared error

as well as the maximum error. The maximum error

is simply the largest Euclidean distance between the

current approximation and any point. Both datasets

can be approximated very well with the PIP algo-

rithm, even at high compression ratios.

Figure 7 shows the application of the multivari-

ate PIP algorithm in combination with a data driven

condition monitoring algorithm. The algorithm used

is based on an autoassociative kernel regression

(AAKR) algorithm (Hines and Garvey, 2006). For

evaluation purposes the original datasets were split

into two equally large sets. The ﬁrst part served as

training for the data driven algorithm and was com-

pressed via PIP previous to training. The second part

of the dataset was used as validation for the anomaly

detection. Figure 8 was derived from the ﬁrst halves

and ﬁgure 7 was derived from the second halves of

the datasets.

Gas Turbine

Agricultural Vehicle

500 600 700 800 900 1000 1100 1200 1300 1400

# PIPs

0

100

200

300

400

500

600

700

MSE (10^-6)

0,00

0,01

0,02

0,03

0,04

Anomaly Detection MSE vs. #PIPs

Anomaly Detection Max Error vs. #PIPs

Max Error

600 800 1000 1200 1400 1600 1800

# PIPs

0

1000

2000

3000

4000

5000

6000

7000

0,0

0,5

1,0

1,5

2,0

Max Error

Anomaly Detection MSE vs. #PIPs

Anomaly Detection Max Error vs. #PIPs

MSE (10^

-6)

Figure 7: The two graphs show the mean squared error

(MSE) as well as the maximum error for anomaly detection

on the validation set for each dataset versus the number of

PIPs used. The original datasets were split into two halves.

The ﬁrst halves were compressed via the multivariate PIP

algorithm and then used as training data for the AAKR al-

gorithm. The second halves of the original datasets, called

validation sets, were then evaluated with the trained AAKR

algorithms. For the agricultural vehicle both error types

quickly drop to very low levels. This suggests that the train-

ing contains only few different states. A compression ratio

of 1 : 4 seems as an acceptable choice for this type of sys-

tem. The gas turbine shows a similar behavior. Here a com-

pression ratio of 1 : 3 seems to be an adequate choice.

7 FURTHER CONSIDERATIONS

In the previous section a lossy compression of mult-

10 20 30 40 50 60 70 80 90 100

# PIPs

0

200

400

600

800

1000

1200

1400

1600

MSE

Error vs. #PIPs

Gas Turbine

Agricultural Vehicle

10 20 30 40 50

0

100

200

300

400

MSE

# PIPs

Error vs. #PIPs

Figure 8: The two graphs show the mean squared error

(MSE) of compression versus the number of PIPs selected

by the algorithm for the two training sets. The original

datasets were split into two halves. The ﬁrst halves were

compressed via the multivariate PIP algorithm and the used

as training data for the AAKR algorithm. The second halves

of the original datasets, called validation sets, were then

evaluated with the trained AAKR algorithms. For the ﬁrst

training set (agricultural vehicle) and the second training

set (gas turbine) the approximation error quickly converges

to zero. This suggest that both datasets are well suited for

compression through a multivariate compression algorithm.

ivariate datasets using the perceptually important

points algorithm was considered. As this algorithm

contains no optimization in terms of preserving sta-

tistical properties such as mean and variance of the

original dataset, additional steps have to be taken to

preserve these. Information about the multivariate

statistics can be preserved e.g. by combining the

lossy compression with a kernel density estimation

(Jones et al., 1996) technique. In this conﬁguration

the dataset is compressed ﬁrst through the PIP algo-

rithm. In a second step a kernel density estimation

(KDE) with the selected PIP utilized as centers for

the kernels is applied. The kernel parameters are then

adjusted to ﬁt the original dataset. In a decompression

stage the likelihood for the current state being in each

PIP is calculated and stochastic properties are simu-

lated through the weighted density estimation.

With this procedure the compression reﬂects a

model of the original time series. It can be ensured

that relevant statistical properties of the process data

are not lost during compression, although the exact

occurrence of the original dataset is lost. The com-

bination of these two approximation algorithms con-

ICINCO 2011 - 8th International Conference on Informatics in Control, Automation and Robotics

392

serves the time dependencies as well as statistical

properties of the original process.

8 CONCLUSIONS

It was shown that datasets exhibiting strong stochas-

tic dependencies can be efﬁciently compressed by a

multivariate compression algorithm. On the example

of a simple artiﬁcial dataset it was demonstrated that

especially in the domain of high compression ratios

the multivariate compression algorithm outperforms

its univariate equivalent.

In extension to condition monitoring, utility com-

panies are beginning to build fault diagnosis data

bases to diagnose upcoming critical events through

empirical fault diagnostic algorithms, refer to (Feller

et al., 2010). These efforts require optimized long-

term compression techniques which are able to sep-

arate relevant from non relevant information in high

dimensional process data. The introduced multivari-

ate compression algorithm is able to provide the nec-

essary features.

REFERENCES

Bristol, E. (1990). Swinging door trending: Adaptive trend

recording. In ISA National Conference Proceedings,

volume 45, pages 749–753.

Chen, H., Li, J., and Mohapatra, P. (2004). RACE: Time se-

ries compression with rate adaptivity and error bound

for sensor networks. In Mobile Ad-hoc and Sen-

sor Systems, 2004 IEEE International Conference on,

pages 124–133. IEEE.

Chevalier, R., Provost, D., and Seraoui, R. (2009). As-

sessment of Statistical and Classiﬁcation Models For

Monitoring EDFs Assets. In Sixth American Nu-

clear Society International Topical Meeting on Nu-

clear Plant Instrumentation.

Chung, F., Fu, T., Luk, R., and Ng, V. (2001). Flexible time

series pattern matching based on perceptually impor-

tant points. In International Joint Conference on Ar-

tiﬁcial Intelligence Workshop on Learning from Tem-

poral and Spatial Data, pages 1–7.

Eruhimov, V., Martyanov, V., Raulefs, P., and Tuv, E.

(2008). Supervised compression of multivariate time

series data. Relation, 10(1.125):5395.

Feller, S. (2009). Parameteridentiﬁkation bei einem

geregelten multidimensionalen stochastischen prozess

am beispiel einer reaktorkhlpumpe. Diplomarbeit.

Feller, S. and Chevalier, R. (2010). Parameter Disaggre-

gation for High Dimensional Time Series Data on the

Example of a Gas Turbine. In 38th ESReDA Seminar,

Pcs, Hungary, May 4-5, 2010.

Feller, S., Chevalier, R., Paul, N., and Pauli, D. (2010).

Classiﬁcation Methods for Failure Mode Diagnosis on

the Example of Synthetic Data and RCP Leak Flow

Data. In EPRI Technical Report.

Fu, T. (2010). A review on time series data mining. Engi-

neering Applications of Artiﬁcial Intelligence, pages

164–181.

Fu, T., Chung, F., Ng, V., and Luk, R. (2001). Pattern

discovery from stock time series using self-organizing

maps. In The 7th ACM SIGKDD International Con-

ference on Knowledge Discovery and Data Mining,

Workshop on Temporal Data Mining, pages 26–29.

Citeseer.

Gillespie, D. T. (1996). Exact numerical simulation of the

ornstein-uhlenbeck process and its integral. Phys. Rev.

E, 54(2):2084–2091.

Hawkins III, S., Darlington, E., Cheng, A., and Hayes, J.

(2003). A new compression algorithm for spectral and

time-series data. Acta Astronautica, 52(2-6):487–492.

Hines, J. W. and Garvey, D. R. (2006). Development

and Application of Fault Detectability Performance

Metrics for Instrument Calibration Veriﬁcation and

Anomaly Detection. Journal of Pattern Recognition

Research, 1(1).

Jones, M. C., Marron, J. S., and Sheather, S. J. (1996). A

brief survey of bandwidth selection for density estima-

tion. Journal of the American Statistical Association,

91:401–407.

Press, W., Teukolsky, S., Vetterling, W., and Flannery, B.

(2007). Numerical Recipes: The Art of Scientiﬁc Com-

puting. Cambridge University Press.

Stoffer, D. (1999). Detecting Common Signals in Multiple

Time Series Using the Spectral Envelope. Journal of

the American Statistical Association, 94(448).

Thornhill, N., Shoukat Choudhury, M., and Shah, S. (2004).

The impact of compression on data-driven process

analyses. Journal of Process Control, 14(4):389–398.

Uhlenbeck, G. E. and Ornstein, L. S. (1930). On the theory

of the brownian motion. Phys. Rev., 36(5):823–841.

OPTIMIZED STRATEGIES FOR ARCHIVING MULTI-DIMENSIONAL PROCESS DATA - Building a Fault-diagnosis

Database

393