Authors:
Robertas Damasevicius
1
;
Jurgita Kapociute-Dzikiene
2
and
Marcin Wozniak
3
Affiliations:
1
Kaunas University of Technology, Lithuania
;
2
Vytautas Magnus University, Lithuania
;
3
Silesian University of Technology, Poland
Keyword(s):
Text Mining, Text Phonology, Text Modes, Rhythm, Empirical Mode Decomposition.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining Text and Semi-Structured Data
;
Symbolic Systems
Abstract:
The rhythmicity characteristics of the written text is still an under-researched topic as opposed to the similar research in the speech analysis domain. The paper presents a method for text deconstruction into text modes using Empirical Mode Decomposition (EMD). First, the text is encoded into a numerical sequence using a mapping table. Next, the resulting numerical sequence is decomposed into Intrinsic Mode Functions (IMFs) using EMD. The resulting text modes provide a basis for further analysis of a text as well as specific characteristics of the language of the text itself. The text modes are used further to derive the measures of text complexity (cardinality) and rhythmicity (frequency) as well as the visual representations (scalograms, convograms), which can provide important insights into the structure of the text itself. The application of EMD to text analysis allows to decompose text into basic harmonics, which can be attributed to the structural units of the text such as syl
lables, words, verses and stanzas. Higher order harmonics however can be observed only in the rhymed types of the text such as poetry.
(More)