Data Processing Modeling in Decision Support Systems

Concepción M. Gascueña

and Rafael Guadalupe

Department of Computing, Polytechnic of Madrid University, Carretera de Valencia Km7, 28031 Madrid, Spain

Department of Topographic, Polytechnic of Madrid University, Carretera de Valencia Km7, 28031 Madrid, Spain

Keywords: Multidimensional Models, Data Processing in Multidimensional Databases, Data Processing in Data

Warehouses, Data Processing in Decision Support Systems, Virtual factEntity.

Abstract: Due to the advancement of both, information technology in general, and databases in particular; data storage

devices are becoming cheaper and data processing speed is increasing. As result of this, organizations tend

to store large volumes of data holding great potential information. Decision Support Systems, DSS try to

use the stored data to obtain valuable information for organizations. In this paper, we use both data models

and use cases to represent the functionality of data processing in DSS following Software Engineering

processes. We propose a methodology to develop DSS in the Analysis phase, respective of data processing

modeling. We have used, as a starting point, a data model adapted to the semantics involved in

multidimensional databases or data warehouses, DW. Also, we have taken an algorithm that provides us

with all the possible ways to automatically cross check multidimensional model data. Using the

aforementioned, we propose diagrams and descriptions of use cases, which can be considered as patterns

representing the DSS functionality, in regard to DW data processing, DW on which DSS are based. We

highlight the reusability and automation benefits that this can be achieved, and we think this study can serve

as a guide in the development of DSS.

1 INTRODUCTION

One of the challenges of Software Engineering (SE),

is to propose: rules, process, guidelines and models

that address Software development: quickly,

efficiently, in a specific and unambiguous manner

and resulting in a quality product. Methodologies are

proposed continually, with varying degrees of

complexity and agility; leading teams in a certain

direction during the software development process,

also referred to as software life cycle. In recent

years, SE has acquired great importance and,

increasingly, less software developments that being

undertaken without prior planning. In SE the Cases

of use (CU), are considered by most members of the

scientific community as a technique, not necessarily

object-oriented, which allows us to model the

functionality of a software system at a high level of

abstraction, and with no regard to the programming

paradigm in which the system will be implemented.

Decision Support Systems DSS, are based upon

historical databases containing large amounts of

data. They try to extract the information processing

the data in a certain way; allowing managers to

make decisions and predict future trends.

"Predicting the future by studying the past."

However, DSS are not always based on databases

built for this purpose, sometimes using transactional

databases, something we don’t consider efficient.

We believe the DSS must be based on data

warehouses (DW), or multidimensional databases

(MMDB); and following specific, multidimensional

(MM), data models; which reflect the

multidimensional semantics and lead to analysis

from the earliest stages of system development. In

this work we are using MM and CU for modeling

processing data in DSS.

This paper is structured as follows: Section 2

includes a study on related works in MMDB and on

the representation of functionality in the

development of Software Systems. In Section 3, we

present our proposal. Section 4 includes an example

using our proposal. In section 5, some conclusions

and future work are offered.

2 RELATED WORKS

Most DSS development proposals are mainly

concerned with the database on which they are built

133

M. Gascueña C. and Guadalupe R..

Data Processing Modeling in Decision Support Systems.

DOI: 10.5220/0003982601330138

In Proceedings of the 14th International Conference on Enterprise Information Systems (ICEIS-2012), pages 133-138

ISBN: 978-989-8565-10-5

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

upon, (Kimball, 1996), (Imon, 2002), (Mazón,

2006). To develop this DB, data models have been

shown, as in (Tryfona, 2003), (Torlone, 2003),

(Malinowski, 2004), (Luján-Mora, 2006),

(Gascueña, 2006). There are authors that propose

using transactional database models, as

(Malinowski, 2004), (Tryfona, 2003), however other

authors propose using specific models that treat the

semantic MM in a specific manner, as (Kimball,

1996), (Torlone, 2003), (Gascueña, 2008c). In recent

years, the importance given to MM models has

increased, and there are even some proposals that try

to represent spatial-temporal data behavior within

them, as in (Malinowski, 2005), (Parent, 2006),

(Gascueña, 2008a), (Bimonte, 2008). This leads us

to stress the value that the scientific community is

giving to MM models used in the development of

the DW or MMDB. Regarding the processing of

data, there are some works as in (Gascueña, 2008b),

where an analysis is performed, while separating the

concepts of basic data and derived data. They use

models to represent both data types, and they

propose an algorithm responsible for the automatic

gathering of the data derived from the DW. However

there are few proposals regarding the data

processing functionalities of DSS.

The CU is the most widely employed technique

to model Software systems functionalities. However,

these are almost always used in a particular way for

each system; they are "tailored" by the applications

that they model. We think it would be desirable to

propose CU "patterns" that could

be reused by most

systems that need the same functionalities. There are

some initiatives that tackle generalized problems,

such as in (Guttorm, 2005) who proposes using CU

to represent the supposed potential threats that a

system could face, modeling both the functionality

and threats of systems, They name these, cases of

bad use, misuse cases. In (Kantorowitz, 2003) a

framework is proposed, oriented on CU, to build,

automatically, graphical user interfaces (GUI). They

also attempt to reuse these CU in different

applications. In (Luján-Mora, 2006) the MM

semantics are specified using class diagrams and

they propose new artifacts aimed at collecting such

semantics. They include an example of how to

specify two data requirements by two CU. But the

proposed CU, are entirely dependent upon the

discussed requirements. In this paper we propose a

general reusable CU, a “pattern”, which may be

used as a guide in the development of DSS to the

end of modeling the data processing functionality.

3 PROPOSAL

We are framing this paper within the Software

Engineering into the Analysis Phase of software life

cycle. We will use data models and CU to propose a

guide for development of DSS; proposing, on one

hand, appropriate conceptual MM data models that

reflect the basic starting data required to develop a

DW. And on the other hand, we will use CU to

represent the functionality of any DSS, regarding

data processing, and that will allows us to obtain,

dynamically and automatically derived data. The

MM data models used in this study were shown in

(Gascueña, 2006) and completed in (Gascueña,

2008a). To obtain dynamically derived data, we

have used the algorithm presented in (Gascueña,

2008b).

3.1 Data Models

In this section we offer a brief introduction of

conceptual MM model named FactEntity (FE), to

better understand our proposal.

The MM models should represent the data

focused to analysis at the earliest stages of the DSS

development. They try to represent a fact object of

study, from different perspectives or dimensions and

with different levels of detail or granularities. Levels

are obtained by grouping basic data from different

criteria. With different criterion are formed different

hierarchies. A hierarchy contains a set of levels

grouped according to a criterion. A dimension can

have multiple hierarchies. A fact consists of a set of

fact measurements.

The FE model distinguishes between basic data

(existing data) and data obtained by processing the

basic data according to the analysis criteria, also

called derived data. Facts and dimensions are

combined to obtain the named factEntities. The

factEntities can be basic and virtual. The Basic

factEntities BfE, are obtained through the

dimensional levels of minimum granularity (leaf

levels) and basic fact measures. The named Virtual

factEntities VfE, are obtained through the processing

of basic data. The rules by which each factEntity

contains a single level of each dimension and a set

of fact measures are complied with. Though

sometimes this set could be empty. In figure 1, we

see the constructors, elements, relationships and

functions used by the FE model, representing the

MM semantics.

Hierarchies are classified according to the

involvement their “path Rollup” (moving from a

lower to a higher level) has over fact measures. Next

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

134

Figure 1: Basic FE model completed with the functions

that will apply on fact measures when the Rollup is run.

we see these:

 Dynamic hierarchy (its route involves changes

in fact measures).

 Static hierarchy (its route does not involve

changes in the fact measures).

 Hybrid hierarchy (is a mixture of the two

previous types).

As we show in Figure 1, the Static and Hybrid

hierarchies represent spatial characteristics. We see

that the BfE counts with representatives of the

dimensional leaf levels and fact measures. Also, the

diagram represents both, the functions to be applied

to achieve higher levels in the hierarchy (this is of

specially interest in changing spatial granularities),

and the analysis functions to be applied on fact

measures, once the rollup between the dimensional

levels has been performed (this is necessary as to

perform basic data processing and obtaining derived

data).

3.2 Cases of Use

In this proposal we present a generic CU model

aimed at picking up DSS functionalities in regard to

the processing of basic data. This intends to be a

guide for developers and analysts of these systems.

3.2.1 CU Diagram

In Figure 2 we can see the To Generate Virtual

factEntities diagram, which represents a main CU

named Generate Virtual factEntity VfE_CU, and

four associated CU: Create Table, Create

Materialized View, Create View, Other. All of them

count with the <<extend>> label. This provides the

functionality the ability to store the VfE both, inside

and outside the DW, and also in various, different,

ways, leaving the final choice up to the user (analyst

manager).

Figure 2: CU diagram: To Generate Virtual FactEntities.

This shows how the VfE storage could be chosen in the

analysis.

3.2.2 CU Description

In Tables 1, 2 and 3 we can observe the VfE_CU

description. This is a generic CU that defines the

minimum functionality required in any DSS, needed

for the processing and gathering of derived data,

from a DW. To develop this CU we have used the

data model of Figures 1. Table 1 contains the

principal scenario or typical course of events,

functionalities. The head of this CU has been

omitted since it is not relevant for this work.

The VfE_CU performs the following tasks: First,

the user requests to generate VfE, the system asks

for the order in which dimensions will be crossed to

get all possible VfE. Second, the system calls the

Gascueña algorithm, which in turn obtains all the

possible forms of data crossing between dimensions

and fact measures. Third, the system presents the

Table 1: Events typical course of Generate VfE_CU.

user with a listing of the obtained VfE. Fourth, the

user chooses to generate a certain VfE (this action

DataProcessingModelinginDecisionSupportSystems

135

obtains and process data of basic DW, respective of

the VfE structure chosen). Fifth, the system obtains

and presents the data and requests an option towards

the data treatment, as it is shown in the diagram in

figure 2. Sixth, the system allows the execution of as

many VfE as needed by the user. The system will

also allow obtaining other VfE listings, taking

dimensions in different order, and as many times as

the user wants. All this is explained in detail in

Table 1, which has 16 steps. In Tables 2 and 3 we

observe some alternatives, which we have

considered more important, to VfE_CU’s typical

course.

Table 2: Events alternative courses contemplate various

options for storing structures and data of VfE.

Table 2 describes alternatives to the so called

“Create table CU”, (step 8 of events typical course).

There are various options: Create materialized views

CU, Create views CU and Others CU. Table 3

describes alternatives to run additional VfE (option:

Yes, step 13 of the typical course of events); and

alternatives to obtain new lists of VfE, choosing

Table 3: Events alternative courses that show the ability to

implement different VfE; and the ability to obtain new

lists of VfE choosing dimensions in different orders.

dimensions in different orders (option: Yes, step 15

of events typical course). Both, the typical course as

alternative courses may contain more options, but

here, they have not been considered since they do

not bring greater value into our discussion.

3.2.3 Gascueña Algorithm

Let’s briefly define the Gascueña algorithm, for

further details please refer to (Gascueña, 2008c). We

describe it in three stages.

First: Given a set of n dimensions, we obtain all

possible combinations, in groups of 1, 2,...,n-1 and n

dimensions. We apply the follow formula (1):

,...,D

]/∀i∈[1,..,n] Λ ∀p∈ [i+1,...,n] Λ

(p> i OR p= Ø).

(1)

Second: The Cartesian product is applied on

each of the previous subgroups, taking into account

that in some application domains, the order in which

we choose the elements to make up the subgroup

will be significant.

Third: The Virtual factEntities are obtained by

adding to the Cartesian subgroups obtained in the

previous step the respective fact measures. We then

apply the following formula (2):

VfE=([D

X…XD

],{G

(me

)})-(BfE). (2)

Where: (D

X…XD

) represent the Cartesian

Product. And (G

(me

) is the set of compatible

functions G

with the basic fact measure (me

). It

excludes the Basic fE).

4 APLICATIONS

Next we will develop a practical example in which

we will apply our proposal.

We consider it desirable to study the damage

caused by insect plagues in agriculture of certain

Earth zones over time. The spatial area is divided

into plots, and these are grouped into cities. It is

necessary to store the % of extension of each plague

on each plot in a given and determined moment of

time. The plagues are exterminated, or attempted to,

through the use of different technologies. The study

requires storing existing technologies and

effectiveness of such in the treatment of infected

plots. The effectiveness is measured by the % of

deaths caused by the treatment. The evolution of

plagues on each plot is checked weekly. The spatial

areas will be represented by spatial data with

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

136

geometric shapes, such as: surfaces, lines and points

that can be indistinctly used. The % extension of

plague and % deaths will be studied from different

perspectives and details: Time: week, year; Zones:

plot, city; Technical: technical type; Plague: plague

type, family and order.

To offer a solution to this study we propose

building a DSS, which allows us to analyze the

effectiveness of anti plague treatments, and aid us in

choosing the best decisions regarding the treatment

of new emerging plagues. The DSS will consist of a

MMDB or DW complete with spatial treatment.

Furthermore, the system allows the data processing

of DW on demand, in an easy and quick manner.

Figure 3 shows the proposed FE Basic model as a

solution for the storing of the input data.

Figure 3: Basic FE model for Plagues Study.

We have identified the following dimensions:

Time, Plague, Technique and Location Space. The

Time dimension has two granularities: week, year.

The Plague dimension has three granularities: type

plague, family and order. The Location Space

dimension has two semantic granularities: plot and

city; and three geometric granularities (spatial

representation): surface, line and point. Also this

dimension form a dynamic hierarchy, a static

hierarchy and three hybrid hierarchies. The “Plague

Evolution” basic factEntity contains the primary

keys inherited from the leaf level of the dimensions

(underlined in the diagram). The week level has two

relationships (start, final) with BfE. The fact under

consideration contains two fact measures: Expanse%

and Killed%. In the diagram, we can also observe

the functions used to create higher levels, of both the

geometric and semantic granularities, within the

spatial dimension. In figure 4, we observe how the

Basic FE model is completed with information

regarding the functions to be used for the analysis,

once the Rollup is made.

Figure 4: FE conceptual multidimensional model,

prepared for processing data by “Plagues Study DSS”.

Now and here we could have included the CU

models presented in Figure 2 and tables 1, 2 and 3,

adapted to our example. But, if we study these

models in detail, we note that it is necessary to

include anything new in the descriptions and

diagram of the VfE_CU. We observe that the CU

model proposed is valid to represent the required

minimum functionality required to process the

derived data in this example.

5 CONCLUSIONS AND FUTURE

RESEARCH

In this paper we have proposed a methodology,

which attempts to serve as a generalized guide for

the development of DSS following the Software

Engineering guidelines. Our proposal is framed

within the Analysis phase of the software

development process life cycle. We have used MM

data models and CU to lead the development. On the

one hand, we offer the foundations to build a DB

that collects MM semantics (to create the DW, main

part of DSS). On the other hand, we model the data

processing, defining the desired functionality

through a CU model. We explain our proposal in

three steps. First, we propose carrying out a

conceptual multidimensional data model with the

adequate structure required to store the basic or

starting data in a DW. The model takes into account

the analysis requirements. Second, the basic data

model obtained in the previous step is completed

with the operations and functions that we would

want to use in the data analysis. This new model

presents all the necessary elements needed for the

DataProcessingModelinginDecisionSupportSystems

137

processing of the data, allowing us to obtain new

data structures for the derived data. Third, data

functionality processing is modeled by a CU. In

particular, it is defined and developed the Virtual

factEntity CU. The VfE_CU details the minimum

and necessary events sequence required for the basic

data processing. These VfE_CU use an algorithm

that interacts with data models, collecting the

information represented in them, to generate,

automatically and on-demand, all the possible VfE.

The steps above outlined, can be considered to have

a high level of abstraction and are independent of its

implementation. We believe that the proposed CU

can serve as a basic pattern in the development of

DSS; which later may be completed and adapted to

each particular situation, if necessary. Finally, we

have presented an example in which we develop a

case study using our own proposal.

Our future research is aimed at discovering other

general behavioral patterns, which could guide the

development of the DSS. In addition, we are

interested in developing a tool that would allow us to

describe and transform, automatically, the FE data

models and the VfE_CU, into real systems. The FE

model transformation will be made to implement the

models in commercial DB manager Systems, under

different paradigms: Relational, Object Relational or

Object Oriented. The VfE_CU transformation will

allow us to implement a basic interface, with the

features described in this proposal, while also

allowing for the possibility to choose programming

languages among the most popular ones.

REFERENCES

Bimonte S., Tchounikine A., Berloto M., 2008. Integration

of Geographic Information into Multidimensional

Models. ICCSA 2008: International, 2008.

Gascueña C. M., Cuadra D., Martínez P., 2006. A

Multidimensional Approach to the Representation of

the Spatiotemporal Multigranularity. ICEIS 2006.

Gascueña C. M., Guadalupe R., 2008a. Some Types of

Spatio-Temporal Granularities in a Conceptual

Multidimensional Model. 7th International

Conference, APLIMAT Bratislava, Slovak.

Gascueña C. 2008b. Propousal of a Conceptual Model for

the Representation of Spatio Temporal

Multigranularity in Multidimensional Databases. PhD

Thesis. Polytechic University of Madrid, Spain.

Gascueña C. M., Guadalupe R., 2008c. A Study of the

Spatial Representation in Multidimensional Models,

ICEIS 2008.

Guttorm Sindre, E Andreas L. Opdahl, 2005. Eliciting

security requirements with misuse cases, in the

Journal of Requirements Eng, Issue 10, pp 34–44.

Inmon, W. 2002. Building The Data Warehouse. Jhon

Wiley & Sons.

Kantorowitz E., Lyakas A., Myasqobsky A.. 2003. A Use

Case-Oriented User Interface Framework. Software.

SwSTE '03. IEEE International Conference on.

Kimball R. 1996. The Data Warehouse Toolkit. John

Wiley&Sons Ed.

Luján-Mora S., Trujillo J., Song Il- Yeol. 2006. A UML

profile for multidimensional modeling in data

warehouses. DKE, 59(3), p. 725–769.

Malinowski, E. and Zimanyi, E., 2004. Representing

Spatiality in a Conceptual Multidimensional Model.

Proc. of the 12th annual ACM international workshop

on GIS. Washington, DC, USA.

Malinowski E., Zimanyi E., 2005. Spatial Hierarchies and

Topological Relationships in the Spatial MultiDimER

model. Lecture Notes in Computer Science, page 17,

Volume 356.

Mazón J. N., Pardillo J., Meliá S. y Trujillo J., 2006.

Modelado Multidimensional de almacenes de datos

con MDA. XI JISBD 2006.

Parent C., Spaccapietra S., Zimanyi E., 2006. The

MurMur project: Modeling and querying multi-

representation spatio-temporal databases. Information

Systems, Volume 31, Issue 8, Pages 733-769.

Torlone R., 2003. Conceptual Multidimensional Models.

In Multidimensional databases: problems and

solutions, pages 69-90, Idea Group Publishing,

Hershey, PA, USA.

Tryfona, N., Price, R., Jensen, C. S., 2003. Conceptual

Models for Spatio-temporal Applications. In M.

Koubarakis et al. (Eds.), Spatio-Temporal DB: The

CHOROCHRONOS pg. 79-11. Berlin, Heidelberg.

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

138