Construction of Big Data Precision Marketing System Based on Hadoop

Daifen Tang

Chongqing College of Architecture and Technology, Chongqing, 401331, China

Keywords: Big Data, Accurate Marketing, Hadoop, Network Construction.

Abstract: In order to promote the information economization process of small and medium-sized enterprises in China,

promote the prosperity and development of market economy, and improve the problems of poor information

management conditions and weak internet background marketing ability of traditional enterprises, this paper

combines big data technology to establish an accurate marketing application software. In this system, web

crawler technology is used to capture URL data of web pages, and hadoop platform is used to collect, clean,

calculate and process the data, and javaweb technology is used to realize data visualization. The system uses

apriori association rule algorithm and clustering algorithm to effectively help e-commerce enterprise man-

agers understand customers' needs and consumption preferences, accurately analyze old and new consump-

tion data to improve consumer loyalty, stabilize existing consumer groups, increase potential customer

groups, improve sales and marketing plans of enterprises, improve sales performance of enterprises, and help

enterprises obtain more ideal economic benefits.

1 INTRODUCTION

In the current era, network technology has been

popularized, and Internet application technology has

developed rapidly. With the birth of online payment

means, people's production and lifestyle are closely

related to the Internet. As many traditional small and

medium-sized enterprises are not aware of the im-

portance of Internet technology combined with

marketing, it is difficult to obtain the complete data

information of consumers in the current era, which

leads to the problem that marketing strategies do not

match the market trend and environment, and they

cannot fully meet consumers' demand for goods,

resulting in low conversion rate of consumers, and

eventually losing money until the enterprises disap-

pear. Moreover, the explosive growth of consumer

information data also makes more and more local

server databases of enterprises overwhelmed and

difficult to support. But there are important com-

mercial values behind the huge data information, and

small and medium-sized enterprises need to make

marketing decisions based on these effective data

analysis reports. (Zhang, 2020)

The rational use of big data technology can first

help enterprises to integrate a large amount of con-

sumer information data for unified management, and

use various algorithms in data mining technology to

understand consumers' precise needs and preferences

according to the data, so as to improve the current

enterprise marketing strategy and promote the eco-

nomic benefits of enterprises.

According to the above analysis, the author be-

lieves that in order to meet the needs of today's

e-commerce platform enterprise managers, an

e-commerce precision marketing system based on

Hadoop platform web technology and web crawler

technology came into being under the background of

big data. The users of hadoop-based precision mar-

keting application system are managers of small and

medium-sized enterprises. According to the needs of

users, it can help enterprise managers to collect

consumer data from various channels to establish

data centers managed by enterprises independently.

With the advantages of big data technology, the data

of different types, different channels and based on

different communication protocols are unified and

integrated, and the data exchange is realized.

2 KEY TECHNOLOGIES

2.1 Python Crawler Technology

The web crawler technology refers to the technology

of automatically extracting web information by a

518

Tang, D.

Construction of Big Data Precision Marketing System Based on Hadoop.

DOI: 10.5220/0011751100003607

In Proceedings of the 1st International Conference on Public Management, Digital Economy and Internet Technology (ICPDI 2022), pages 518-521

ISBN: 978-989-758-620-0

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

class of programs according to Internet rules. Python

is the most commonly used development language

for this technology. The principle of web crawler

technology is realized by setting up new crawling

rules and setting the URL of the portal. (Ji, 2017)

Firstly, the developer selects a certain amount of

seeds according to the requirements and saves the

corresponding URLs. Then, a URL queue to be

grabbed is set by the algorithm to save the selected

URLs. After that, the program starts to download the

contents corresponding to these URLs and grab the

key information, and the processed URLs will be

saved in the new grabbed queue. In the meantime,

DNS resolution data and webpage download data

generated by URL resolution will also be saved in the

downloaded webpage database.

2.2 Hadoop Processing Platform

As a development and application ecosystem, Ha-

doop platform can support data-intensive applica-

tions, and its component team is growing with time.

The most important components are distributed file

system HDFS and parallel programming model

MapReduce. The HDFS is responsible for the dis-

tributed storage of massive data, while mapreduce is

to realize centralized parallel computing of distrib-

uted data, and the two complement each other. Ha-

doop ecosystem has many subprojects including

Ambari, Hive, HBase, Zookeeper, Flume, Mahout,

etc. besides Hadoop and mapreduce. With the coop-

eration of multiple components and clear division of

labor, even inexperienced developers can use the

advantages of clusters to deal with big data conven-

iently and quickly. (Li, 2017)

2.3 J2EE Framework

The J2EE is a simplified javaweb development plat-

form designed and developed by SUN Company,

which can develop a series of application software

platforms. In order to simplify the application soft-

ware development program of large enterprises, J2EE

has specially developed a reusable component mod-

ule to improve the development efficiency. Besides,

it has also built a structure that can automatically

handle the level, thus reducing the skill requirements

of developers in developing application software.

(Ma, 2022)

2.4 Development Environment

In this paper, the author briefly introduces the related

technologies of platform development and use. In the

big data precision marketing system, Hadoop is used

as a big data server cluster to process data and store it

in MySQL database, and the corresponding applica-

tion platform is developed by using JavaWeb tech-

nology.

According to the data volume and overall opera-

tion requirements of the system, this paper chooses to

build a Hadoop3.3.1 cluster with three nodes. Then,

the distributed collaboration system zookeeper-3.4.1,

distributed file system HDFS 2.6.5, flume1.9.0, Hive

0.13.1 and Hbase2.6.5 are installed and deployed in

these three nodes synchronously, and the initial con-

struction of hadoop cluster is completed. The cluster

will be developed under Linux system. This paper

selects Centos6.5 Server release version of Linux

operating system. The version of the web crawler

framework Scrapy is 2.5, and Python3.8 is chosen as

the development language. (Lin, 2016)

In this system, the front-end development tool of

JavaWeb application is boomstrap+jquery, and the

development language is JavaScript+HTML+CSS.

The back-end Java development tool is IDEA

2021.1.3 (Ultimate Edition), the development envi-

ronment is JDK 1.8, and the J2EE framework of

Tomcat+Spring MVC+Spring+MyBatis is is used in

the implementation of this system. The development

language is Java, and MySQL 8.0.28 is selected to

help manage data.

3 OVERALL DESIGN

According to the needs of enterprises, hadoop-based

Big Data Precision Marketing System establishes a

top-down one-stop data collection, analysis, pro-

cessing and visualization system. The main func-

tions of data collection, data storage, data cleaning,

data query and data analysis are supported by ha-

doop ecological cluster, and visualization is realized

by javaweb technology.

First of all, collect data from three sources. One

is the collection of local enterprise server data by

flume, two is the URL data collected from the prod-

uct details page by python web crawler technology,

and the last is the access to Taobao, Weibo and other

shared data through external JDBC interface. These

data will be preliminarily cached in HDFS distrib-

uted storage. And the data of the crawler set is stored

by redis. The data calculation module is implement-

ed by mapreduce, which analyzes the preliminary

data and manages the crawler results, and uses data

mining techniques such as association rule algorithm

to achieve the portrait of consumers. After pro-

cessing, the data will be saved in HDFS and hive.

Construction of Big Data Precision Marketing System Based on Hadoop

519

Figure 1: Data pool establishment code.

Figure 2: Data pool establishment code.

The overall design of javaweb of this system choos-

es B/S mode and adopts MVC for development. The

architecture is designed and developed by the tradi-

tional three-tier architecture of J2EE, which is the

control layer, the business layer and the persistence

layer. The business logic design of the core function

of the whole system is developed by spring, the

control layer is used to design the interactive func-

tion of client display, which is designed by spring-

mvc, and the data persistence layer uses mybatis, as

shown in Figure 1.

4 FUNCTION REALIZATION

OVERALL DESIGN

The enterprise marketing data analysis system based

on big data technology is aimed at small and medi-

um-sized entities operating enterprises. When the

user logs in to the system through the account pass-

word, you can see three major data analysis function

modules: existing customer data, potential user data

and target market data, which are developed accord-

ing to the user's needs.

4.1 Existing Customer Data

In the existing customer data, the data is mainly the

data information of existing customers stored local-

ly, and the main form is static data. In addition to the

most basic personal information of customers' age,

gender and region, there are also consumption in-

formation of each customer, such as consumption

amount, consumption amount and consumer goods

category. For the core function of the user system is

the search engine function, which requires that the

front end of the application system will send out a

large number of request instructions for interaction

with the back end of the database server. In this

process, JDBC needs to be built and destroyed con-

tinuously, which wastes the resources and memory

of Internet databases. Therefore, this system also

introduces a subsystem of database connection pool,

which allows applications to reuse database connec-

tions. The partial implementation code is shown in

Figure 2.

4.2 Potential Customer Data

The data of potential users is the key function of this

system, through which users can obtain the infor-

mation of potential users' consumption characteris-

ICPDI 2022 - International Conference on Public Management, Digital Economy and Internet Technology

520

tics, and enterprise customers can accurately deliver

to the promotion groups according to this infor-

mation.

According to the local enterprise server data and

the URL data collected by python web crawler

technology, the system obtains the Internet shared

data such as Taobao and Weibo through the external

JDBC interface for unified integration to form the

data of potential consumers. Establish data model

through Hadoop to build portraits of consumer

groups. According to the consumer information data,

the crowd portrait model can help enterprise mar-

keting managers accurately determine the key con-

sumer groups of products according to the charac-

teristics of consumer groups. In the realization algo-

rithm of this function, k-mens clustering algorithm is

used to subdivide users. The system obtains data

according to the ID of consumers for integration,

and the consumption records are the key data set

used by k-means algorithm. The system sets these

data records as feature vectors for analogy cluster

analysis, and the more consumers are used, the more

accurate it is. The formula is defined as follows,

where P is the standard point, and E is the sum of

squared errors of the selected object data, which is

the mean value.

E=





p − m





Firstly, through the existing data, we can obtain

the Boolean mark list of consumers' goods according

to the user ID to obtain the initial data. The system

obtains the strong correlation information between

commodities by calculating the confidence. For

example, if the execution degree of commodity A

and commodity B is 0.5, half of consumers who buy

commodity A will also buy commodity B. There-

fore, the system will advise users to increase the

promotion of commodity B among consumers of

commodity A, and the buyers of commodity A are

potential users of commodity B. The quantitative

definition of strong association rules is the minimum

confidence threshold, named min_conf. In this sys-

tem, apriori rule analysis algorithm is used to find

frequent itemsets, and this algorithm is used to col-

lect commodity data information of association rule

algorithm.

5 CONCLUSION

Due to the limited ability and time of the author,

there are still many shortcomings in this research.

The depth and breadth of the research content need

to be improved, and the follow-up work needs the

support of more experts and scholars. The data re-

search of this paper stays in qualitative research but

lacks quantitative analysis. Meanwhile, the article is

not specific to a certain industry, so although it has

certain applicability to various industries, the spe-

cific conditions of different industries are different,

and their application strategies are also different. It

lacks systematic marketing recommendation method

and deeper user segmentation method, which needs

further improvement.

REFERENCES

Ji Xiaoyan. Research and Implementation of Personalized

Recommendation in E-commerce Based on Hadoop.

[D]. Lanzhou Jiaotong University.2017.06.

Li Zhang. Design and Implementation of Internet Data

Marketing System Based on Hadoop. [D].University

of the Chinese Academy of Sciences.2017.04.

Lin Qingpeng. Research on Precision Marketing Strategy

Based on Big Data Mining. [D]. Lanzhou University

of Technology.2016.04.

Ma Xiaohong. Application and Value Overview of Big

Data Precision Marketing in E-commerce. [J]. Elec-

tronic commerce.2022.01.

Zhang Wenhui. Research on BJ Company's Precision

Marketing Strategy under the Background of Big Da-

ta. [D]. Hebei GEO University.2020.12.

Construction of Big Data Precision Marketing System Based on Hadoop

521