A Data Analysis Framework for High-variety Product Lines in the

Industrial Manufacturing Domain

Christian Lettner and Michael Zwick

Software Competence Center Hagenberg GmbH, Softwarepark 21, Hagenberg, Austria

Keywords:

Data Analysis, Generic Data Models, Extract Transform Load, Data Warehouse.

Abstract:

Industrial manufacturing companies produce a variety of different products, which, despite their differences

in function and application area, share common requirements regarding quality assurance and data analysis.

The goal of the approach presented in this paper is to automatically generate Extract-Transform-Load (ETL)

packages for semi-generic operational database schema. This process is guided by a descriptor table, which

allows for identifying and ﬁltering the required attributes and their values. Based on this description model,

an ETL process is generated which ﬁrst loads the data into an entity-attribute-value (EAV) model, then gets

transformed into a pivoted model for analysis. The resulting analysis model can be used with standard business

intelligence tools. The descriptor table used in the implementation can be substituted with any other non-

relational description language, as long as it has the same descriptive capabilities.

1 INTRODUCTION

Data analysis in the industrial manufacturing domain

rises some — often contradicting — challenges when

compared to traditional business centered data anal-

ysis. These challenges, which are addressed in this

paper are:

• Use of generic data models in operational

databases to represent product variability

• Product speciﬁc data analysis models that are fea-

sible for domain experts (in contrast to generic

data models without explicit metadata)

• High importance of correlation detection for de-

fect cause analysis in an environment with a huge

amount of independent dimensions

• Cope with technical issues like, e.g, repeated mea-

surement for calibration

Unlike conventional data models, which are lim-

ited to a speciﬁc domain scope, generic data mod-

els represent more abstract concepts in order to widen

the range of applicability and to a certain degree stan-

dardize the way different facts, which share a com-

mon structure, are represented in the model. E.g., the

pattern bill of material (Jiao et al., 2000) describes a

rather generic whole-parts relationship, which is us-

able for describing a wide range of real world phe-

nomena. The main incentive for using a generic data

models is the standardization that goes along with it,

i.e., it prevents the creation of different (conventional)

data models for the same domain. Especially for data

exchange and integration in a business intelligence

scenario, mappings need to be established between

these different data models which can be error-prone.

One of the most common generic modeling pat-

terns in the relational data model is the Entity-

Attribute-Value (EAV) model (Dinuab and Nadkar-

nia, 2007). The EAV model is used for describing

entities that can potentially have a vast amount of

attributes, but in a single instance of an entity typ-

ically only a few attributes actually occur. For ex-

ample, in industrial manufacturing, for a given prod-

uct many measurement types (temperature, pressure,

electric resistance, ...) are available and collected dur-

ing the whole production process, but at a given time

only a small subset of the available measurements are

actually recorded. The usual approach of reserving

one attribute per measurement type in a relational ta-

ble would lead to many NULL values and as a result

to a sparsely ﬁlled table. The structure of an EAV

table usually consists of three attributes (entity,

attribute, value), where entity and attribute

form the primary key of the relation. Sometimes, this

gets extended with an additional timestamp attribute.

Referring to the example above, entity could be

a pressure sensor being manufactured, attribute a

key value for the measure type ”internal resistance”,

209

Lettner C. and Zwick M..

A Data Analysis Framework for High-variety Product Lines in the Industrial Manufacturing Domain.

DOI: 10.5220/0004887802090216

In Proceedings of the 16th International Conference on Enterprise Information Systems (ICEIS-2014), pages 209-216

ISBN: 978-989-758-027-7

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: Semi-generic data model of an industrial manufacturing company.

e.g. ”RI”, and value the real value measured during

quality inspection.

One of the biggest downsides of generic data mod-

els is that they do not mirror the real world concepts

they where designed to represent in a recognizable

way, like documentation inherent in the description

of the data model via table names, attribute names

etc. This is why generic data models heavily rely

on outside documentation for users (application de-

velopers, business analysts) to make sense off. Espe-

cially for data analysis, generic data models need to

be transformed to a form that more closely represents

the mental image a data analyst has of the domain to

be analyzed. For example, in the case of the EAV

model, the transformation which brings the data back

into a columnar form for analysis is called pivoting

(Dinu et al., 2006).

Figure 1 depicts a data schema which is often

found in industrial manufacturing, especially in dis-

crete parts manufacturing like the machine and com-

ponent building industry. Within a product line, the

individual products share a lot of similarities, e.g.,

the are manufactured on the same assembly line and

can be represented by a bill-of-materials. In ﬁgure 1,

these similarities are represented by the master data

tables. However, every product is also distinct, in in-

dustrial manufacturing most notably the number and

kind of measurements taken during the production in

order to control the whole process, calibrate the prod-

uct and assure it meets quality requirements. In ﬁg-

ure 1 these differences are represented by the prod-

uct speciﬁc tables, which are connected to the gen-

eral master data tables through various relations. Note

that this approach differs from a strictly generic ap-

proach using a EAV model, where all product speciﬁc

measurements are stored as (entity, attribute, value)-

tuples. Such a model is often used, to keep the de-

scriptive quality of the schema as a source of docu-

mentation, and also to establish a border between dif-

ferent products within the product line, which usually

have a separate domain specialist responsible for just

one product. Using this model, a new product gets

introduced by building upon the existing master data

and add product speciﬁc tables which can accommo-

date all the necessary information. Thus, it is on the

one hand ﬂexible enough to meet the requirements of

the different products produced, and on the other hand

is still expressive enough to be used and extended by

domain experts.

Based on this semi-generic operational database

schema for industrial manufacturing processes, the

contribution of this paper is to introduce and evaluate

an analysis framework for data warehousing, which

needs a minimum amount of adaption when the un-

derlying operational database schema changes. The

main idea is to use a descriptor table, which func-

tions as a schema mapping (Bernstein and Melnik,

2007) between the operational schema and the anal-

ysis schema allowing to generate the ETL process

needed to load data into the data warehouse.

The rest of this paper is organized as follows: sec-

tion 2 describes the current work related to generic

data models as well as automatic generation of ETL

processes. Section 3 describes the architecture of our

approach to automatically generate ETL processes for

a semi-generic data model. Section 4 contains details

of the implementation. Finally, section 5 concludes

with an evaluation and intended further work.

2 RELATED WORK

In order to be able to automatically generate domain

independent ETL processes from arbitrary data mod-

els a theoretical basis for schema matching is needed,

as described in (Bernstein and Melnik, 2007). Mod-

ern ETL tools (Chaudhuri et al., 2011) provide such a

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

210

domain independent approach to data integration, but

require the mappings to be manually engineered.

Schema-less NoSql database management systems

like, e.g., Bigtable (Chang et al., 2008) or semi-

structured data models (Acharya et al., 2008) efﬁ-

ciently store NULL-values, thus mitigating the draw-

back of domain oriented data schemas of having

many NULL values when comparing them to generic

schemas like EAV. However, NoSql is targeted to-

wards semi-structured mass data, it is not particularly

suited to deal with strongly structured, relationship

heavy data like, e.g., master data. This often leads

to heterogeneous IT infrastructures with both NoSql

and traditional relational database systems. Thus, it is

still desirable for many companies to exclusively use

a relational database management system.

Business intelligence development tools for ETL

(Stumptner et al., 2012) typically use available meta-

data, e.g., constraints and foreign keys, to create tem-

plates which need to be further parameterized by the

domain experts. In our approach, domain knowledge

is speciﬁed by end users beforehand, our generator

then uses this information to create the ETL processes

without further adjustments needed by the end user.

Thus, the domain metadata is available for all ETL

processes to be developed in the future.

(Mu

noz et al., 2009) presents an approach to auto-

matic generation of ETL processes. It is based on the

Model Driven Architecture (MDA) framework and

generates Process Speciﬁc Models (PSM) from Pro-

cess Independent Models (PIM) using Query View

Transformations (QVT). One PIM describes a sin-

gle ETL process and is completely implementation-

independent. The PIM is the main source of doc-

umentation for the ETL process. In (Atigui et al.,

2012) a framework to automatically integrate data

warehouse and ETL design within the MDA is intro-

duced. It is based on the Object Constraint Language

(OCL). In the paper at hand, the descriptor table is

closely related to the PIM, as it describes the ETL

process in a platform independent way using classes,

attributes and ﬁlter criteria, even though our example

implementation is based on the relational data model.

The descriptor table functions as a documentation of

the ETL process, with the possibility of transform-

ing the condensed representation of the table into a

more human-readable format. Instead of process spe-

ciﬁc models, our approach uses SQL as a common

language in the data warehousing world.

In (Skoutas and Simitsis, 2006), ontologies are

used to specify structure and semantics of multiple

data source schemata as well as the data warehouse

schema. Using reasoning, conceptual ETL processes

are then inferred automatically, which specify the

transformation of one or more source schemata to the

data warehouse schema. The main motivation for us-

ing ontologies is to overcome structural and semantic

heterogeneity. Comparing this to our approach, we

use a semi-generic data model for different products

in a product line, where the non-generic, i.e., product

speciﬁc parts are explicitly mapped via the descrip-

tor table. Thus, because of our generic approach, we

have no need for an inferred mapping of schemata.

(Skoutas and Simitsis, 2006) is only concerned with

the generation of conceptual ETL processes, a sub-

sequent transformation into a platform dependent im-

plementation is required.

(Khedri and Khosravi, 2013) proposes and imple-

ments a delta-oriented approach to handling variabil-

ity in database schemata for software product lines.

The start out with a core schema containing manda-

tory features that gets modiﬁed using delta scripts de-

pending on with optional or alternative features are

selected for a speciﬁc product. In contrast, our ap-

proach uses a strict separation of database objects

which are common to all products, i.e., master data,

and product speciﬁc parts of the database schema.

3 ARCHITECTURE

Figure 2 gives an overview of the data analysis ar-

chitecture presented in this paper. Data is trans-

formed from the operational database to the analysis

database passing three different stages. The opera-

tional database is a relational database, the analysis

database is implemented as an OLAP (on-line ana-

lytical processing) database (Chaudhuri et al., 2011).

The ﬁrst stage is responsible for dealing with activi-

ties speciﬁc to the operational database. The second

step executes activities not dependent on the source or

target database. The last stage is responsible for deal-

ing with speciﬁcs of the analysis database. Thus, each

stage is responsible for executing an arbitrary number

of activities falling in one of three groups: operational

database speciﬁc, independent and analysis database

speciﬁc. When a new type of activity is needed, it

can be implemented as a template for instantiation

and reuse. Operational as well as analysis database

speciﬁc activities are generated based on a descriptor

table and use a interval deﬁntion table.

The main task of the operational database spe-

ciﬁc activities is to retrieve data from the operational

database representing the speciﬁc product variation of

interest and transform it into a domain independent

data structure, in our case into an EAV model. To do

so, at least a change data capture and staging activity

must be implemented. The change data capture ac-

ADataAnalysisFrameworkforHigh-varietyProductLinesintheIndustrialManufacturingDomain

211

Figure 2: Overview of the data analysis architecture.

tivity implements a functionality to identify the data

from the operational database that needs to be loaded

into the analysis database. The staging activity is re-

sponsible to determine the location of the data in the

operational database and for applying ﬁlters when re-

trieving the data from the operational database. Be-

cause these types of activities are speciﬁc to the ap-

plied operational database, they are generated based

on activity templates and conﬁgurations found in the

descriptor table (see section 4.1).

Independent activities are domain speciﬁc and

perform transformations and calculations on the data

which are required in the analysis model and uni-

formly applied to all product variations. They operate

on an EAV data structure, which does not reﬂect any

product speciﬁc variability at the schema level. Two

activities implemented in this work are repeated mea-

sures and persistence. The repeated measures activity

implements a logic to deal with measurements that are

taken multiple times, that is often found in the indus-

trial manufacturing domains. The activity logically

groups measurements into measurement-cycles. New

measurement-cycles start at certain events or actions

within the production process, like applying a special

treatment or a repair. Measurements may be taken as

often as needed, but by the end of every measurement-

cycle only the last measured value of a measurement

type must be used in the analysis model as a reference

for the respective measurement-cycle. That means,

the measurement value may result from a previously

performed measurement-cycle as well. This calcula-

tion, technically also known as last-non-empty func-

tion, is implemented as an independent activity and

thus applied uniformly to all product variants (see

section 4.2). Even though last-non-empty is a fea-

ture present in most analysis front ends, our template

is independent and can be used with all OLAP front

ends, regardless of whether a last-non-empty function

is available.

Similar to the repeated measures activity, the per-

sistence activity is designed as an independent activ-

ity, that is responsible for persisting the data into a ta-

ble. It provides a mechanism for already loaded data,

i.e. apply insert, update or merge operations.

Finally, analysis database speciﬁc activities build

the connection to the analysis database. Because

these activities represent product speciﬁc variations,

the activities are generated based on the descriptor ta-

ble and use the interval deﬁntion table. For example,

the pivot activity transforms the data from the prod-

uct independent EAV model to a more expressive and

explicit analysis model.

OLAP is often applied in analysis databases. In

standard OLAP only dimensions are allowed to be

used at the axis position of a query. As numerical

values must be modeled as facts, correlation analysis

between numerical measures can not be performed, as

it would require facts being placed at the axis position.

To overcome this drawback, some front end tools pro-

vide proprietary solutions for this problem, but in or-

der to be able to use standard OLAP front end tools,

the interval dimension activity performs a discretiza-

tion of the numerical measurements by mapping the

values to corresponding interval dimension elements.

Again, the interval dimension activity calculates the

dimension intervals according to the speciﬁcation in

the descriptor table (see section 4.3).

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

212

4 IMPLEMENTATION

The architecture introduced in section 3 has been pro-

totypically implemented using Microsoft SQL Server

2008 Database, Integration Services and Analysis

Services.

4.1 Descriptor Table

The descriptor table captures the variability in the op-

erational database, as depicted in ﬁgure 1, with re-

spect to the requirements of the analysis model. For

every measure in the analysis model it deﬁnes: (i) the

location of the data in operational database, (ii) the

constraints that must be applied when querying the

operational database to get meaningful results, and

(iii) an interval speciﬁcation for the measures in the

analysis model. Table 1 shows an example descrip-

tor table. Column MTYPE represents the measure in

the analysis model. TABNAME and COLNAME repre-

sent the table and column where the measure can be

found in the operational database. CHANNELNB and

CHANNELTYPE represent the ﬁlter conditions applied

to the master data subset of the operational database

when the measurement values are retrieved. INTVL

represents the interval speciﬁcation applied when the

measures are loaded into the analysis database (for

further details see section 4.3).

Based on the descriptor table shown in table 1, the

following data access query is generated in the stag-

ing activity an installed as a data access view in the

staging area:

SELECT

P.SERIALNNB,

D.MTYPE

CASE D.TABNAME+’.’+D.COLNAME

WHEN ’MEASURE.STARTTIME’ THEN

CONVERT(varchar,MEASURE.STARTIME)

WHEN ’MEASUREDETAIL.REF’ THEN

CONVERT(varchar,MEASUREDETAIL.REF)

WHEN ’MEASUREDETAIL.ACTUAL’ THEN

CONVERT(varchar,MEASUREDETAIL.ACTUAL)

END as MVALUE

FROM

PRODUCT P,

MEASURE M,

MEASUREDETAIL MD,

DESCRIPTOR D

WHERE P.ID = M.PID

and M.ID = MD.MID

and M.CHANNELNB = D.CHANNELNB

and M.CHANNELTYPE = D.CHANNELTYPE

The query selects tables and columns, and applies

ﬁlters according to the descriptor table. Furthermore,

it deﬁnes the level of granularity required in the anal-

ysis model. For the reason of clarity, the pivot into

the domain independent EAV data structure is per-

formed by joining the DESCRIPTOR table along with

the case statement. An optimized query omites join-

ing the DESCRIPTOR table at all, utilizing SQL pivot

operations, thus increasing the performance.

4.2 Repeated Measurements

Table 2 shows an example of measured values

stored in the operational database. The column

SERIALNB identiﬁes the entity or product being mea-

sures, MCYCLE the assigned measurement cycle for

the measurement, MTYPE the measurement type or

attribute being measured and MVALUE the measured

value. The query that enriches every measurement

cycle with the last available measurement for every

measurement type is given below:

SELECT

TMP.SERIALNB,

TMP.MCYC,

TMP.MTYPE,

TMP.MVALUE

FROM (

SELECT

T1_DATA.SERIALNB,

T1_CYC.MCYC,

T1_DATA.MTYPE,

T1_DATA.MVALUE,

rank() OVER (partition BY T1_DATA.SERIALNB,

T1_CYC.MCYC,

T1_DATA.MTYPE

ORDER BY T1.DATA.MCYC DESC) AS POS

FROM

T1 T1_DATA INNER JOIN

( SELECT DISTINCT SERIALNB, MCYCLE FROM T1) T1_CYC

ON T1_DATA.SERIALNB = T1_CYC.SERIALNB AND

T1_DATA.MCYCLE <= T1_CYC.MCYCLE

) TMP

WHERE TMP.POS = 1

The query performs a self join of all measured

values with the corresponding serial number and the

same or a previous measurement cycle. Then a rank-

ing is performed on a descending sort order of the

measurement cycle, yielding to the rank of 1 for the

last performed measurement. Table 3 shows the re-

sult of the query. Rows added by the query are de-

picted with grey background color. Table 4 presents

the result pivoted by SERIALNB and MCYCLE, showing

every MTYPE in a separate column, as performed in

the pivoting activity of the architecture described in

section 3.

4.3 Interval Dimensions

The interval dimensions are implemented using an

interval deﬁnition table intvlParams and two func-

tions uf SelectIntvlId and uf DimInterval. Ta-

ble 5 shows an example interval deﬁnition table for

ADataAnalysisFrameworkforHigh-varietyProductLinesintheIndustrialManufacturingDomain

213

Table 1: Descriptor table DESCRIPTOR.

MTYPE TABNAME COLNAME CHANNELNB CHANNELTYPE INTVL

A MEASURE STARTTIME 10 E

B MEASUREDETAIL REF 1 S INVTL-B

C MEASUREDETAIL ACTUAL 10 E INVTL-C

Figure 3: Correlation analysis using interval dimensions for measurement type B and C.

Table 2: Measured values stored in the operational database.

SERIALNB MCYCLE MTYPE MVALUE

1 0 A 17:30

1 1 B 7.4

1 2 C 365

2 0 A 17:40

2 0 B 6.3

2 1 B 6.5

2 1 C 302

2 2 A 17:50

Table 3: Last measured values added to measurement cy-

cles.

SERIALNB MCYCLE MTYPE MVALUE

1 0 A 17:30

1 1 A 17:30

1 1 B 7.4

1 2 A 17:30

1 2 B 7.4

1 2 C 365

2 0 A 17:40

2 0 B 6.3

2 1 A 17:40

2 1 B 6.5

2 1 C 302

2 2 A 17:50

2 2 B 6.5

2 2 C 302

two intervals: INTVL-B and INTVL-C. The column

LBND deﬁnes the lower bound and UBND the upper

bound of the interval. The interval is speciﬁed using

two levels, where the interval size for the ﬁrst level

is speciﬁed in the column STEP1, for the second level

Table 4: Pivoted raw data.

SERIALNB MCYCLE A B C

1 0 17:30

1 1 17:30 7.4

1 2 17:30 7.4 365

2 0 17:40 6.3

2 1 17:40 6.5 302

2 2 17:50 6.5 302

Table 5: Interval deﬁnition table intvlParams.

INTVL LBND UBND STEP1 STEP2

INVTL-B -10 10 1 0.1

INVTL-C 200 400 20 5

it is speciﬁed in the column STEP2. If the analysis

model requires ﬁner intervals, additional levels may

be added. The function uf SelectIntvlId(intvl,

v) returns the interval dimension identiﬁer id for the

given value v in the interval intvl. The function is

deﬁned as:

id =











0, if v ≤ lbnd

1 +(ubnd − lbnd)/step2, if v > ubnd

b(v − ubnd)/step2c, otherwise

(1)

lbnd, ubnd and step2 are retrieved from the inter-

val speciﬁcation table depending on the value intvl

passed to the function.

The function uf

DimInterval(intvl) returns

a table containing all interval dimension elements

for the certain interval. Table 6 shows the gen-

erated interval dimension for INTVL-B. The calcu-

lation of the identiﬁer ID provided in the table is

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

214

Table 6: Interval dimension table for interval INTVL-B.

ID DESC1 SORT1 LBD1 UBND1 DESC2 SORT2 LBND2 UBND2

0 ≤10.0 0 NULL -10.0 ≤10.0 0 NULL -10.0

1 -10.0 to -9.0 1 -10.0 -9.0 -10.0 to -9.9 1 -10.0 -9.9

...

10 -10.0 to -9.0 1 -10.0 -9.0 -9.1 to -9.0 10 -9.1 -9.0

11 -9.0 to -8.0 2 -9.0 -8.0 -9.0 to -8.9 11 -9.0 -8.9

...

200 9.0 to 10.0 20 9.0 10.0 9.9 to 10.0 200 9.9 10.0

201 >10.0 201 10.0 NULL >10.0 201 10.0 NULL

based on the same algorithm used by the function

uf SelectIntvlId, i.e. the identiﬁer returned by the

function uf SelectIntvlId represents the lookup

key to the interval dimension element.

Interval dimensions are calculated when the

OLAP model is processed, thus the interval dimen-

sions are not materialized in the analysis database.

Changes at the interval speciﬁcation table only re-

quire a reprocessing of the OLAP model. Figure 3

shows an example correlations analysis based on in-

terval dimensions.

5 CONCLUSIONS

Focusing on the industrial manufacturing domain us-

ing a basic descriptor table allowed us to quickly

automatically generate an ETL process and its cor-

responding analysis model that reﬂects the different

analysis requirement of all product variations. When

dealing with changing products, quickly providing

analysis models for new products is crucial in the in-

dustry.

The approach in this paper uses a relational

descriptor table to automatically generate ETL

processes for semi-generic operational database

schemas. Filters based on constants on the opera-

tional database for retrieving data as well as interval

dimensions used in the analysis model can be spec-

iﬁed to accommodate the differences in the analysis

requirements. The resulting analysis model does not

depend on proprietary client tool features, thus allows

to perform analysis with standard OLAP tools. The

modular organization of the architecture allows the

creation of additional activities when needed. The

distinction between dependent and independent activ-

ities ensures the decoupling of the analysis database

from the operational database.

Repeated measures and interval dimension repre-

sented the core operations in our application scenario.

Implementing them as separte and self-contained ac-

tivities allows them to be easily reused for other prod-

uct variations.

Nevertheless, in this work data have been loaded

from a single data source into the data warehouse. If

data have to be integrated from multiple sources, data

quality issues (Chaudhuri et al., 2011) like duplicates

or inconsistent representations may occur. Moreover,

the expressiveness of the description language used

should be increased, i.e. introducing variables and

conditions to be used in ﬁlter speciﬁcations.

ACKNOWLEDGEMENTS

This work has been supported by the COMET-

Program of the Austrian Research Promotion Agency

(FFG).

REFERENCES

Acharya, S., Carlin, P., Galindo-Legaria, C., Kozielczyk,

K., Terlecki, P., and Zabback, P. (2008). Relational

support for ﬂexible schema scenarios. Proc. VLDB

Endow., 1(2):1289–1300.

Atigui, F., Ravat, F., Teste, O., and Zurﬂuh, G. (2012). Us-

ing ocl for automatically producing multidimensional

models and etl processes. In Data Warehousing and

Knowledge Discovery, pages 42–53. Springer.

Bernstein, P. A. and Melnik, S. (2007). Model management

2.0: manipulating richer mappings. In Proceedings of

the 2007 ACM SIGMOD international conference on

Management of data, SIGMOD ’07, pages 1–12, New

York, NY, USA. ACM.

Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach,

D. A., Burrows, M., Chandra, T., Fikes, A., and Gru-

ber, R. E. (2008). Bigtable: A distributed storage sys-

tem for structured data. ACM Trans. Comput. Syst.,

26(2):4:1–4:26.

Chaudhuri, S., Dayal, U., and Narasayya, V. (2011). An

overview of business intelligence technology. Com-

mun. ACM, 54(8):88–98.

Dinu, V., Nadkarni, P., and Brandt, C. (2006). Pivoting ap-

proaches for bulk extraction of entity-attribute-value

data. Comput. Methods Prog. Biomed., 82(1):38–43.

ADataAnalysisFrameworkforHigh-varietyProductLinesintheIndustrialManufacturingDomain

215

Dinuab, V. and Nadkarnia, P. (2007). Guidelines for

the effective use of entityattributevalue modeling for

biomedical databases. International Journal of Medi-

cal Informatics, 76(11-12):769–779.

Jiao, J., Tseng, M. M., Ma, Q., and Zou, Y. (2000). Generic

bill-of-materials-and-operations for high-variety pro-

duction management. Concurrent Engineering,

8(4):297–321.

Khedri, N. and Khosravi, R. (2013). Handling database

schema variability in software product lines. In To

appear: The 20th Asia-Paciﬁc Software Engineering

Conference, APSEC 2013.

noz, L., Maz

on, J.-N., and Trujillo, J. (2009). Automatic

generation of etl processes from conceptual models.

In Proceedings of the ACM twelfth international work-

shop on Data warehousing and OLAP, DOLAP ’09,

pages 33–40, New York, NY, USA. ACM.

Skoutas, D. and Simitsis, A. (2006). Designing etl processes

using semantic web technologies. In Proceedings of

the 9th ACM international workshop on Data ware-

housing and OLAP, DOLAP ’06, pages 67–74, New

York, NY, USA. ACM.

Stumptner, R., Freudenthaler, B., and Krenn, M. (2012).

Bia accelerator - a template-based approach for rapid

etl development. In ISMIS’2012, Foundations of In-

telligent Systems. Springer.

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

216