Linux system and developed by ssh framework which
combines spring, springmvc and hibernate. The
system data is collected, converted, cleaned and
counted by setting up hadoop cluster of five servers.
We will provide users with a rigorous and efficient
decision-making platform from the perspective of
relevant personnel of local industrial and commercial
administration departments. The establishment of a
warehouse and a self-service business data analysis
platform integrating the economic operation data of
various enterprises can help local government and
industrial and commercial managers provide great
convenience for data analysis, reduce the workload of
statistical staff and improve the management effi-
ciency of local economy.
2 KEY TECHNOLOGIES
2.1 B/S Structure
The big data analysis system of enterprise economic
operation designed in this paper adopts B/S structure.
The B/S is the structure of browser/server, which is
widely used in web application development. In the
B/S structure, the client uses the browser title, while
the server is used to run the core technology. The
network environment of B/S is mostly used in wide
area network, and only the devices of browser and
operating system need to be loaded, so this structure
is more suitable for application and application de-
velopment with a wide range of customers. (Li, 2019)
2.2 Hadoop Ecology
The Hadoop is the infrastructure of a distributed
system, developed by Apache Foundation. The de-
sign of this ecosystem is mainly used to solve the
problems of massive data storage, analysis and cal-
culation in the era of big data. The Hadoop ecosystem
is mainly composed of mapreduce computing com-
ponent, yarn resource scheduling component, HDFS
data storage component and other auxiliary tools. The
Hadoop ecological cluster covers all kinds of com-
ponents in the big data technology ecosystem, in-
cluding business model layer, task scheduling layer,
data computing layer, resource management layer,
data storage layer and data transmission layer.
(Wang, 2015).
2.3 Classification and Prediction
Algorithm for Data Mining
2.3.1 K-nearest Neighbor Algorithm
K-nearest neighbor algorithm divides the number set
into several categories, and calculates the repre-
sentative particles of each category. X refers to the
distance between different prediction points and
representative points, and the final value X is the
minimum distance point.
Assuming that the number of categories is n and
the number of representative points of each category
is m, the classification function is:
𝑔
(x)=min
x − x
,k=1,2,3....,𝑀
(2)
In which i in x
represents n class, and k repre-
sents the k of m representative points. The category
with the largest number among the k minimum dis-
tances of the predicted point x is the category of the
predicted point, and k=1 is the nearest neighbor
method.
2.3.1 Decision Tree Algorithm
The decision tree algorithm is an inductive algorithm
classification rule based on the decision tree deduced
from the unordered sequence. It is a recursive algo-
rithm from top to bottom, so it is necessary to con-
struct the relationship between categories and attrib-
utes to predict unknown classes. The current main-
stream decision tree algorithms include c4.5, ID3 and
cart, etc. This paper focuses on C4.5 decision tree
algorithm, which is an improved algorithm based on
ID3. The construction of C4.5 decision tree first
needs to input the data set, classification attribute and
sample attribute set of the required data, and use V, C
and S to replace them respectively. 1. create node n .
2. where N=C when s is the set of c, otherwise, exe-
cute 3. 3. S is empty. N = the category with the most
frequent occurrences of S; S=NULL, then execute 4.
4. calculating the highest information gain rate v,
wherein N=V . 5. If s is the set of sample points of V,
then S=null, add a leaf node, otherwise, return
(V-,C,). 6. Recursive results are used to complete the
construction. (Mao, 2018)
2.4 Development Environment
The development environment of enterprise eco-
nomic operation big data analysis system is divided
into two parts, one is the construction of hadoop big
data cluster, the other is the application environment
of Javaweb technology. According to the required