Authors:
Benjamin Aziz
1
and
Aysha Bukhelli
2
Affiliations:
1
School of Creative and Digital Industries, Buckinghamshire New University, High Wycombe HP11 2JZ, U.K.
;
2
Office of the Prime Minister, Bahrain
Keyword(s):
Information Hiding, Lexical Steganography, Machine Learning, Text Steganography.
Abstract:
We evaluate in this paper the security of a recent method proposed in literature for the embedding of hidden content in textual documents using paragraph size manipulation. Our steganalysis is based on machine learning, and the classification method we use for the analysis of a document utilises text attributes, such as words per paragraph, paragraph proportion based on sentences and other English document features. The embedding model showed to be resilient against the analysis techniques, where the highest plotted accuracy was 0.601, which is considered poor. The analysis methods were able to detect around half of the embedded corpus, which is equivalent to random guess. We concluded that it is difficult to detect an embedding model that manipulates paragraphs of novel texts, as the structure of these texts depend fully on the writer’s style of writing. Thus by shifting the sentences up and down paragraphs without changing the order of the sentences and affecting the context of the
text, it yields a reasonably secure method of embedding.
(More)