over FactDate from 21% to 31% as the data volume
increased. Moreover, Figure 7(b) depicts that this
improvement, on four-dimensional queries, is up to
54%. We can conclude that, when it comes to pro-
cessing three or four CFs, FactDate and CNSSB are
not suitable for Scenario 2.
0
1000
2000
3000
4000
5000
10 20 40 80
Scale Factor (SF)
FactDate CNSSB SameCF
0
1000
2000
3000
4000
10 20 40 80
Elapsed Time (in seconds)
Scale Factor (SF)
(a) Three-dimensional.
0
1000
2000
3000
4000
5000
10 20 40 80
Scale Factor (SF)
(b) Four-dimensional.
Figure 7: Processing time of high-dimensional queries.
7 CONCLUSIONS
In this paper, we analyze three physical DW designs,
called CNSSB, SameCF, and FactDate. We consider
two different enterprise scenarios, determining OLAP
queries with different numbers of dimensions. We ob-
serve how the attribute arrangement over CFs accord-
ing to these designs influences OLAP query perfor-
mance. The results of our experiments showed that
storing all data in one CF provided better performance
for high-dimensional queries. In this scenario, the
SameCF was the most appropriated to be deployed.
On the other hand, storing dimensions in different
CFs benefited low-dimensional queries. In this sce-
nario, the FactDate and the CNSSB were more appro-
priated. Further, when processing one-dimensional
queries that required data from the dimension Date,
the FactDate design provided the best performance
results. Since data warehousing is characterized by
mostly read-only operations, this organization in CFs
is an important issue to take into account when com-
paring NoSQL column-oriented databases.
By using this guideline, the company is able to
provide a schema physical design that best suits the
most frequent OLAP queries issued against its data
warehousing application. Regarding benchmarks, we
can conclude that their workload must model different
physical designs in order to provide a more accurate
evaluation focused on the company interests.
ACKNOWLEDGEMENTS
This work has been supported by the follow-
ing Brazilian research agencies: FAPESP (Grant:
2014/12233-2), FINEP, CAPES and CNPq.
REFERENCES
Abadi, D. J., Madden, S. R., and Hachem, N. (2008).
Column-stores vs. row-stores: How different are they
really? In ACM SIGMOD, pages 967–980, NY, USA.
Bog, A. (2013). Benchmarking Transaction and Analytical
Processing Systems: The Creation of a Mixed Work-
load Benchmark and Its Application. Springer Pub-
lishing Company, Incorporated, 1 edition.
Cai, L., Huang, S., Chen, L., and Zheng, Y. (2013). Perfor-
mance analysis and testing of hbase based on its ar-
chitecture. In 12th IEEE/ACIS ICIS, pages 353–358.
Chevalier, M., El Malki, M., Kopliku, A., Teste, O., and
Tournier, R. (2015). Implementing Multidimensional
Data Warehouses into NoSQL. In ICEIS.
Ciferri, C., Ciferri, R., Gómez, L., Schneider, M., Vaisman,
A., and Zimányi, E. (2013). Cube algebra: A generic
user-centric model and query language for olap cubes.
IJDWM, 9(2):39–65.
Dehdouh, K., Bentayeb, F., Boussaid, O., and Kabachi, N.
(2015). Using the column oriented NoSQL model for
implementing big data warehouses. PDPTA’15, pages
469–475.
Dehdouh, K., Boussaid, O., and Bentayeb, F. (2014).
Columnar NoSQL star schema benchmark. In MEDI
2014, pages 281–288.
Doulkeridis, C. and Nørvåg, K. (2014). A survey of large-
scale analytical query processing in mapreduce. The
VLDB Journal, 23(3):355–380.
Floratou, A., Özcan, F., and Schiefer, B. (2014). Bench-
marking sql-on-hadoop systems: TPC or not tpc? In
5th WBDB, pages 63–72.
Folkerts, E., Alexandrov, A., Sachs, K., Iosup, A., Markl,
V., and Tosun, C. (2012). Benchmarking in the cloud:
What it should, can, and cannot be. In 4th TPCTC,
pages 173–188.
George, L. (2011). HBase: The Definitive Guide. O’Reilly
Media, 1rd edition.
Kimball, R. and Ross, M. (2013). The Data Warehouse
Toolkit: The Definitive Guide to Dimensional Model-
ing. Wiley Publishing, 3rd edition.
Moussa, R. (2012). Tpc-h benchmark analytics scenarios
and performances on hadoop data clouds. In NDT,
volume 293, pages 220–234.
O’Neil, P., O’Neil, E., Chen, X., and Revilak, S. (2009).
The star schema benchmark and augmented fact table
indexing. In TPCTC, pages 237–252.
Poess, M. and Floyd, C. (2000). New TPC benchmarks
for decision support and web commerce. SIGMOD
Record, 29(4):64–71.
Poess, M., Smith, B., Kollar, L., and Larson, P. (2002).
TPC-DS, taking decision support benchmarking to the
next level. In SIGMOD Conference, pages 582–587.
Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P.,
Zhang, N., Anthony, S., Liu, H., and Murthy, R.
(2010). Hive - a petabyte scale data warehouse using
hadoop. In 26th ICDE, pages 996–1005.
ICEIS 2016 - 18th International Conference on Enterprise Information Systems
118