correlation relationship, that is, the impact of changes
in one attribute of data on other attributes. This study
is mainly used to analyze the correlation between
college students' Internet behavior and academic
performance.
2.3 Classification of Students' Network
Behaviors
2.3.1 URL Based Keyword Acquisition
User behavior classification refers to the process of
classifying users according to category preferences
when browsing web pages. The collected log data
contains the URL of the web page visited by the user,
extracts the keywords from the URL, and establishes
the URL topic list of the web page visited by the user.
Based on the user's topic list, you can build a category
of web behavior associated with the website category.
The webpage classification method based on URL
keyword extraction is used to classify the webpage. In
order to get the theme of the web page according to
the URL string, the most direct method is to get the
theme corresponding to the URL by matching the
artificially marked website category directory.
Website classification directory is the collection of
information on the Internet website together,
according to different classification topics, placed in
the corresponding directory. It is the most direct and
accurate method to get the URL topic by matching the
website classification directory, but due to the large
workload of marking, it needs to consume huge
manpower and time, and the amount of marking data
is also very limited. In order to solve this problem, the
design of webpage classifier is proposed, based on the
N-gram language model of webpage classification
algorithm, using URL classification directory
matching to determine the URL theme, the URL of all
web pages are mapped to the corresponding webpage
theme one by one. Improve the efficiency and
accuracy of web topic determination, so as to obtain
more comprehensive user behavior classification
information.
2.3.2 User Behavior Classification
Information Representation
After accurately obtaining the topic information of the
web page visited by the user, the topic list is
transformed into the user behavior classification
information to provide material for the input part of
the user behavior classification model. After the topic
list is obtained, by counting the number of
occurrences of each topic in the topic list, we get the
binary group composed of topic t
and frequency
Table 1: URL keywords for web page categories.
Topic Keywords
Game
gamersky, game, 4399, 7k7k, 17173, ali213, yy,
douyu,egame,
Social
Network
extshort.weixin.qq,weibo,
btrace.qq,weibo,tieba.baidu,jiayuan,tianya,zhihu,
Contact
music.163,kugou,y.qq,fm.taihe,xiami,kuwo,yiny
uetai,changba,music,
Video
policy.video.iqiyi,video.ptqy,video,ixigua,v.qq,ha
okan.baidu,youku,v.baidu,mgtv,acfun
Study
wps,cnki ,dict.youdao,wpscdn,flashapp,chinaz,pr
ocesson,dxzy163,icourse163,mooc
Science
Ludashi,windowsupdate,apple,idianshijia,sandai,
duba,ubuntu,zol,ithome
Load
Download,sz-download.weiyun,
ardownload.adobe,download.hongbaoshu
Read
xxsy,zongheng,qidian,read,faloo,qidian,novel,jjw
xc,lrts.me,zongheng,ximalaya,
Search
Baidu.sohu,news.sina,candian,guancha,mil.ifeng,
huanqiu,junshi.china,yahoo,sogou
Shopping
taobao,alibaba,alipay,dangdang,suning,mogu,168
8,mi,
𝑐
(t
, c
)} and form all the resulting binary groups
into a binary list {(t1,c1), (t2,c2), …, (t
, c
)}, the
binary list is the user interest information. It will serve
as input to the building part of the college student
interest set. When keywords corresponding to the
theme appear in a URL, they are mapped to the
corresponding theme, and the statistical URL
keywords are partially displayed (see Table 1).
2.3.3 The Classification and Construction
Process of College Students' Network
Behavior
First, the list of urls accessed by users is obtained, and
the URL topic is obtained by extracting URL
keywords as mentioned above, so as to obtain user
behavior information. Then, the classification model
of college students' network behavior is constructed
by user behavior classification algorithm. The
construction process of college students' network
behavior classification is divided into four steps:
Step1: Extract the original information. Extract
urls accessed by users, count the number of visits to
each URL, and generate a binary of urls and visits
(URL, counts).
Step2: Obtain keyword information. Use the
keyword acquisition method based on URL feature
extraction to get the Topic information of URL, that