to support Travel Mobility. We focus in this paper on
the effective analysis of tourist data by On Line An-
alytical Approach tools designed for supporting the
analysis of Big Data gathered from several sources. In
particular, we define an end-to-end framework to as-
sist decision makers starting from the pre-elaboration
process (that could reveal really hard for heteroge-
neous sources) to the analysis steps. To better under-
stand the features of our framework we first describe
basic tools devoted to Big Data management and our
effort to properly integrate them.
2 BIG DATA: DATA STORAGE
AND ANALYSIS FEATURES
Due to the advances in data gathering and storage and
the availability of new data sources such as social net-
works, data volume collected by public organizations
and private companies is rapidly growing. Moreover,
the users interested to access and analyse these data
are growing too. As a matter of fact, data management
systems are used intensively, in order to answer an in-
creasing number of queries to solve complex analysis
tasks needed to assist decision makers in crucial busi-
ness processes. As a result, activities like data anal-
ysis and business reporting call for ever increasing
resources. Moreover, only few years ago the largest
Data Warehouse size was about 100 Terabytes, while
nowadays Data Warehouse size of Petabytes are fre-
quently built. Therefore, there is a need for better,
faster and more effective techniques for dealing with
this huge amount of data.
Moreover, Big Data exhibits several formats for
raw data being collected. In many practical scenar-
ios, they are really different than the simple numer-
ical and textual information actually stored in Data
Warehouses. Thus, Big Data cannot be analyzed with
common SQL based techniques.
A crucial challenge posed by Big Data is a
paradigm shift in how organization behave with re-
spect to their data assets. More in detail data special-
ists must rethink:
• The data acquisition policies;
• The data analysis techniques most suited for data
being gathered;
• The impact of the analysis on the business strate-
gies.
To better understand the importance of the above
mentioned issues we mention here some key applica-
tion for almost all real life scenarios:
1. Efficient information search, ranking, ad tracking;
2. Geo-referenced analysis;
3. Causal factor discovery;
4. Social Customer Relationship Management;
All these requirements calls Business Intelligence
(BI) systems to provide proper innovative solutions
to complex analysis task. In this respect, decision-
makers need quick access to information as much
complete as possible in order to make accurate and
fast decisions in a continuously changing environ-
ment. Thus, the challenge is to assure a satisfac-
tory efficiency when querying huge Data Warehouses,
which (unfortunately) are (often) built, on top of rela-
tional structures, storing data in a row-oriented man-
ner. Indeed, the relational model is flexible and it is
tailored to support both transactional and analytical
processing. However, as the size and complexity of
Data Warehouses increases, a new approach has to be
proposed as an efficient alternative to row oriented ap-
proach. To this end a valid approach is the data stor-
age in a column oriented way. The rest of this section
is devoted to describe the main component of a Big
Data infrastructure.
2.1 Column Oriented DBMS
Using the row oriented approach, the data storage
layer contains records (i.e. rows), while in a col-
umn oriented system it contains families of rows (i.e.
columns). The widespread use of the relational ap-
proach is mainly due to its flexibility to represent al-
most any kind of data. Indeed, users are able to access
and manipulate data without being involved in any
technical aspects concerning data storage and access.
This is a simple model but particularly suitable for
data repositories used by analytical applications deal-
ing with huge amount of data. Indeed, row-oriented
databases are not adequate to deal with complex anal-
ysis of massive datasets because they are designed
for transactional processing. Thus, this approach is
not suitable in an analytical systems (for large scale
processing), because a lot of read operations are ex-
ecuted in order to access a small subset of attributes
in a big volume of data. In fact, transactional queries
are answered by (typically) scanning all the database
records, but processing only few elements of them.
On the contrary, in a column-oriented database all in-
stances of a single data element, such as account num-
ber, are stored together so they can be accessed se-
quentially. Therefore, aggregate operations such as
MIN, MAX, SUM, COUNT, AVG can be performed
very quickly.
Recently, NoSQL(Not Only SQL) approaches are
being used to solve the efficiency problems discussed
DATA2014-3rdInternationalConferenceonDataManagementTechnologiesandApplications
420