rapidly, complicated calculation, large data
necessary for analysis (Big Analytics), to data
treasure, otherwise, we will be lost in the sea of data.
3.2 Data analysis
In order to understand the data changes, continuous
optimization and improvement, not only the
symptoms but also to cure, make similar problems
no longer appear; continuous monitoring and
feedback, we find the optimal scheme to solve the
problem fundamentally. We must carry out in-depth
analysis of the data, rather than just simple to
generate these complex statements. Analysis of the
model, it is difficult to use SQL to express,
collectively referred to as the depth of analysis (deep
analysis) (Xiong Pai Q,2012).
We need not only through the data to understand
what happens now, more need to predict what will
be carried out using the data, in order to make some
active preparations in action (Figure 2) shows. For
example, by predicting the sales of goods in advance
to take action on the goods for timely adjustents.
Here, (Xiong Pai Q,2012) typical OLAP data
analysis (data collection, aggregation, slicing etc)
are not enough, also need to path analysis, time
series analysis, graph analysis, What-if analysis and
the statistics of hardware / software and never tried
to analysis model, the following is a typical example
of such as time series analysis and graph analysis.
Time series analysis (time series analysis):
business organization has accumulated a lot of
historical transaction information, enterprise
management personnel at all levels of the hope from
the analysis of this data in order to find business
opportunities from some mode, through trend
analysis, and even found some advance is emerging
opportunities. For example, in the financial services
industry, analysts can according to the analysis of
the development of software, analysis of time series
data, looking for profitable trading mode (profitable
trading pattern). After further verification, the
operator can actually use the transaction mode of
trading profits.
Analysis of large scale map analysis and network
(large-scale graph and network analysis): (Ziyu
L,2012) (Social Network) social network virtual
environment is essentially on the real connectivity
description in social network, each independent
entity is represented as a node in the graph, the
relation between the entity represented as an edge.
Through the analysis of the social network, can be
found in some useful knowledge, for example, found
that some type of entity (a type of entity to each
group together, known as the key entity in the
network). This information can be used to direct
product, organization and individual behavior
analysis, the field of potential security threats with
growth analysis. The social network, from the
perspective of geometry, graph nodes and edges are
constantly growing. Using the traditional methods of
processing large-scale chart data is insufficient, need
urgent and effective means of this kind of number
According to the analysis.
Analysis of large scale map analysis and network
(large-scale graph and network analysis): (Ziyu
L,2012) (Social Network) social network virtual
environment is essentially on the real connectivity
description in social network, each independent
entity is represented as a node in the graph, the
relation between the entity represented as an edge.
Through the analysis of the social network, can be
found in some useful knowledge, for example, found
that some type of entity (a type of entity to each
group together, known as the key entity in the
network). This information can be used to direct
product, organization and individual behavior
analysis, the field of potential security threats with
growth analysis. The social network, from the
perspective of geometry, graph nodes and edges are
constantly growing. Using the traditional methods of
processing large-scale chart data is insufficient, need
urgent and effective means of this kind of number
According to the analysis.
4 MAPREDUCE AS THE
REPRESENTATIVE OF THE
RISE OF THE DATA
MANAGEMENT
TECHNOLOGY
4.1 Source of MapReduce
The big data era, in order to carry out the processing
of large-scale data, whether it is operating or
application analysis application, (Yijie W,2012)
parallel processing is the only option, the parallel
processing is not only across multiple cores, more
importantly, it is across nodes, parallel processing
depends on the amount of nodes to improve the
performance of distributed system, a large number
of nodes, even if the cost is not a problem and the
selection of high-end, reliable hardware equipment,
but because the cluster size is large, reaching
thousands of nodes, node failures, network failures
become commonplace, fault tolerance guarantee
New Trend of Data Management
353