Language Processing and a sub-field of artificial
intelligence. The deep learning model can be used to
construct feature text vectors to accurately express
the word meaning and semantic information in
electronic texts, so as to effectively improve the
classification accuracy of electronic texts. (Che,
2019) According to the actual application
requirements, Hadoop cluster will be used to capture
and distribute the electronic text data on the Internet,
MapReduce will be used to call FastText, and
Echarts technology will be used to present the
classification results, so as to design and implement
an electronic text classification system integrating
data collection, preprocessing, data classification
and visual display. The test and actual simulation
show that the system can improve the efficiency of
electronic text classification with excellent
performance and convenient operation, and is
suitable for various scenarios of large-scale
electronic text classification.
2 OVERVIEW OF KEY
TECHNOLOGIES
2.1 Natural Language Processing
The Natural Language Processing (NLP) is an
important direction in the fields of computer science
and artificial intelligence. As an interdisciplinary
subject, the research content involves linguistics,
computer science, mathematics, statistics and other
fields, aiming at realizing human-computer
interaction and communication with natural
language as the medium. It means that all kinds of
software applications are used to process the
information of the form, sound, meaning and so on
of natural language, through input, recognition,
analysis, understanding, generation and output, so
that computers can "understand and understand"
human language, expand the application field of
computers, and replace humans to complete some
work. (He, 2020)
With the rapid development of artificial
intelligence, the application scenarios and fields of
natural language processing are constantly enriched.
The common fields include text information
retrieval, machine intelligent translation, text
classification mining, information extraction and
filtering, speech recognition and generation,
automatic question answering and dialogue, etc.
Among them, text classification is a typical problem
in the field of natural language processing, and most
of the tasks of natural language processing can be
regarded as a classification task, which is in the
upstream stage in the field of natural language
processing research. Text classification can not only
provide necessary preconditions for research in other
fields, but also directly affect the practical
application effect of natural language processing
downstream.
In the initial stage of text classification, expert
rules are mostly used to complete the classification
operation, which requires a lot of human work to
reason and judge, and human factors are
uncontrollable, so it does not have good
expansibility. However, with the rise of machine
learning, text classification has entered the statistical
era, relying on the method of text feature analysis
combined with shallow-level machine learning.
Although the work efficiency, cost control and
application expansibility have been significantly
improved, it still can't keep up with people's demand
for fineness and accuracy. Until the emergence of
deep learning technology, coupled with the
substantial improvement of computer hardware
capabilities, it has greatly promoted the development
of natural language processing and further expanded
the application scope of text classification.
2.2 Deep Learning Model
The deep learning technology based on neural
network architecture is a branch of machine
learning. Its essence is to make computers perform
specific tasks by imitating the way humans acquire
and apply knowledge. (Han, 2021) At present, deep
learning model has gradually become the
mainstream technology for text classification. The
method of constructing feature text vectors based on
deep learning analysis model can accurately express
the word meaning and semantic information in the
text, and automatically acquire the feature
expression ability by virtue of its excellent network
structure, thus avoiding the tedious work of
manually designing rules and features, and realizing
end-to-end problem solving. In this paper, according
to the characteristics of electronic text, FastText
deep learning model is selected to complete text
classification. The application advantage of FAST is
that it is suitable for a large amount of data samples
and supports multilingual expression, and the overall
training speed is far FastText than that of the same
type model. The core principles include model
architecture, hierarchical SoftMax and N-gram
features. Among them, the FastText model
architecture can predict and classify the whole text