3 RESEARCH METHODOLOGY
Despite the apparent simplicity of the task of
classifying and systematizing the analysed texts, and
identifying text topics, it is very difficult to
implement them. The problem cannot be solved
satisfactorily by relying only on keywords or the
syntactic structure of simple phrases. The use of
general semantic analysis alone does not
fundamentally change anything. Existing systematic
analyses provide classification accuracy about
assessment: predefined predictive analysis - about
5%, using predefined analysis and adjusting the topic
of texts - up to 80%.
Text synthesis. In a narrow sense, text synthesis refers
to the construction of natural language phrases and
sentences from formal language records. Structured
phrases may or may not be subject to the requirement
of stylistic correctness, but in any case, they should
not contain semantic and grammatical errors.
Checking the correctness of texts. This is due to the
need to fully analyse the sentences, with the help of
which you can check the grammatical correctness of
the texts.
In the process of systematic analysis, the degree of
automation is that all definitions can be automatically
checked for the consistency of the collected
definitions. An alternative approach could be one in
which definitions of concepts are created from
existing texts with such descriptions and then revised
as necessary in the process of communication with an
expert. To implement this approach, it is necessary to
be able to analyse the semantics of texts in detail.
The essence of explaining the terms in the text is to
form a brief description of the main analysis of the
text. There are two different comment options. In the
first case, a small number of sentences in the text are
identified and analysed, which fully reflect the main
themes of the text. In the second case, the main
themes of the text are identified as meanings, and
these meanings are expressed through new sentences
and text. The last option is preferable, but it is also
more difficult. All modern abstract annotation
systems are based on the first option.
It is called classification and categorization of
documents, identification of document topics, and
automatic abstracting and annotation. This is a
relatively young field, and most of the important
results have been obtained in recent years. First of all,
this is due to the emergence of very large volumes of
textual data available to everyone and the emergence
of computing power corresponding to such volumes.
Text analysis systems operate on a set of documents
whose words are considered features. In addition, the
size of such documents can be very large, and the total
vocabulary for all documents can reach several
hundred thousand words.
4 RESEARCH FINDINGS
After testing such an analytical system, we
immediately see that the most frequent words are
compound adverbs that have almost no effect
simultaneously. Such words are called stop words and
are removed from documents before being converted
into a vector model. In addition to the general
vocabulary of words, it is useful to compile your
vocabulary for each specific task. Another
preprocessing method besides removing words is to
highlight the important part of the word.
The following algorithm of systematic analysis is
used and used in everyday life. By creating an
electronic text rule, we indirectly control the decision
rules that use many systematic analyses. The non-rule
nodes contain the type of "questions" to the
document, while the leaves contain the answers in the
form of the resulting category. "Questions" can be
asked by the user himself, as in the example above, or
calculated based on a training sample, in which case
they usually take the following form: "Do such words
exist in the text?"
The simplicity of the analysis is offset by the
complexity of building such a tree of questions from
a set of systems. In addition to classification,
structured decisions can be used to analyse the
structure of documents and categories, where rules
can be valuable. In practice, the decision is mainly
used for this purpose, because in terms of
classification quality, they are much lower than
systematic models, which will be discussed later.
5 CONCLUSION
The article discussed the problems and methods of
semantic text analysis, but how does one evaluate
how correct the result of a particular method is? A text
with certain categories is divided into two parts: one
is taught and the other is analysed. It is assumed that
the documents to be systematically classified will be
like the documents in the test sample. Of course, this
may not be the case at all, but, unfortunately, there is
no other way to evaluate the quality of the
classification. The generally accepted characteristics
of classification quality are accuracy and
completeness. Accuracy is calculated as the ratio of