DoubleTotalAmount=AGGREGATE(
Items,Order,TotalPrice,SUM
)
The system will break records from the table
Items into groups depending on the values returned
by the column
Order. Then it will sum up values of
the column
TotalPrice for each individual group.
All these computations are performed for one pass
through the fact table.
This definition uses existing columns (measure
and grouping) which have to be defined before the
new aggregated column can be defined. For
example,
TotalPrice in the above expression is a
derived column. However, it is possible to define
these columns in the context of the aggregation
function. Also, an aggregated column could be part
of other expressions. An alternative way to define
aggregation is using de-projection (Savinov, 2012a).
Case columns. The main purpose of these
columns is to group records of the table by assigning
an explicitly specified value depending on some
condition evaluated for the current record. It is
roughly corresponds to SQL case expressions but is
used to define new functions by specifying an output
depending on which condition is satisfied. For
example, if we want to break (partition) all products
into several groups depending on their price then we
specify price intervals (conditions) and the
corresponding output values of this column.
4 CONCLUSIONS
In this paper, we presented a conceptual vision for a
next generation analytical data integration system by
rethinking main principles behind such systems. We
described how these general principles are
implemented in ConceptMix – a self-service tool for
analytical data integration intended for solving a
wide range of typical data wrangling tasks which
precede the visual analysis step.
In future, we are going to extend this technology
by developing a powerful assistance engine which
will leverage the semantic properties of COM. This
includes recommendations for schema mappings,
relationships, aggregations, imports and others.
Another novel function to be added in the future is
selection propagation which leverages the inference
capabilities of COM (Savinov, 2012b; 2006). Also,
we will develop an optimizer for translating
expressions into an efficient code for execution in
the column-oriented data processing engine.
REFERENCES
Abelló, A., Darmont, J., Etcheverry, L., Golfarelli, M.,
Mazón, J.-N., Naumann, F., Pedersen, T.B., Rizzi, S.,
Trujillo, J., Vassiliadis, P., Vossen, G., 2013. Fusion
Cubes: Towards Self-Service Business Intelligence.
IJDWM 9(2), 66-88.
Atzeni, P., Jensen, C.S., Orsi, G., Ram, S., Tanca, L.,
Torlone, R., 2013. The relational model is dead, SQL
is dead, and I don’t feel so good myself. ACM
SIGMOD Record, 42(2), 64-68.
Chaudhuri, S., Dayal, U., Narasayya, V., 2011. An
overview of Business Intelligence technology.
Communications of the ACM, 54(8), 88-98.
Gonzalez, H., Halevy, A., Jensen, C., Langen, A.,
Madhavan, J., Shapley, R., Shen, W., 2010. Google
Fusion Tables: Data Management, Integration and
Collaboration in the Cloud. Proc. ACM Symposium on
Cloud Computing (SOCC 2010), 175-180.
Hanrahan, P., 2012. Analytic database technologies for a
new kind of user: the data enthusiast. Proc. SIGMOD
2012, 577-578.
Idreos, S., Liarou, E., 2013. dbTouch: Analytics at your
Fingertips. Proc. 6th Biennial Conference on
Innovative Data Systems Research (CIDR’13).
Kandel, S., Paepcke, A., Hellerstein, J., Heer, J., 2011.
Wrangler: Interactive Visual Specification of Data
Transformation Scripts. Proc. ACM Human Factors in
Computing Systems (CHI), 3363-3372.
Löser, A., Hueske, F., Markl, V., 2008. Situational
business intelligence. Proc. Business Intelligence for
the Real-Time Enterprise (BIRTE), 1-11.
Morton, K., Bunker, R., Mackinlay, J., Morton, R., Stolte,
C., 2012. Dynamic Workload-Driven Data Integration
in Tableau. Proc. SIGMOD 2012, 807-816.
Morton, K., Balazinska, M., Grossman, D., Mackinlay, J.,
2014. Support the Data Enthusiast: Challenges for
Next-Generation Data-Analysis Systems. Proc. VLDB
Endowment 7(6), 453-456.
Russo, M., Ferrari, A., Webb, C., 2012. Microsoft SQL
Server 2012 Analysis Services: The BISM Tabular
Model. Microsoft Press.
Savinov, A., 2014a. Concept-oriented query language. In
J. Wang (Ed.), Encyclopedia of Business Analytics and
Optimization. IGI Global, 512-522.
Savinov, A., 2014b. Concept-oriented model. In J. Wang
(Ed.), Encyclopedia of Business Analytics and
Optimization. IGI Global, 502-511.
Savinov, A., 2012a. References and arrow notation instead
of join operation in query languages. Computer
Science Journal of Moldova (CSJM), 20(3), 313-333.
Savinov, A., 2012b. Inference in hierarchical
multidimensional space. In Proc. International
Conference on Data Technologies and Applications
(DATA 2012), 70-76.
Savinov, A., 2012c. Concept-oriented model: Classes,
hierarchies and references revisited. Journal of
Emerging Trends in Computing and Information
Sciences, 3(4), 456-470.
ConceptMix-Self-ServiceAnalyticalDataIntegrationbasedontheConcept-OrientedModel
83