An Extended Method for Transmitting Secret Messages in Textual

Documents Based on Paragraph Resizing

Benjamin Aziz

1 a

, Estabraq Makiyah

1 b

and Aysha Bukhelli

2 c

School of Creative and Digital Technologies, Buckinghamshire New University, High Wycombe, U.K.

Ofﬁce of the Prime Minister, Bahrain

Keywords:

Formal Methods, Information Hiding, Lexical Steganography, Text Steganography, Linguistic Steganography.

Abstract:

This short paper presents an extended method for the embedding of secret messages in text documents based

on the readjustment of paragraph sizes in a document. The new method improves on an existing method in

literature proposed by the authors previously, by introducing the idea of choice functions, which allows for

any two paragraphs in a document to be compared. This new method provides for greater ﬂexibility when

performing text steganography. The paper also deﬁnes a modiﬁed algorithm, based on the Difﬁe-Hellman

protocol, for establishing an agreement between two communicating parties on the choice of paragraphs to

compare prior to the commencement of the communication session. Finally, the paper demonstrates the appli-

cability of the extended method by means of a few examples.

1 INTRODUCTION

Steganography, as an art of embedding secret content

in cover media, has taken up in recent decades as an

exciting and important ﬁeld of research part of the

wider ﬁeld of data security and protection research,

as a result of the plethora of digital content format

and content generation technologies. And although

the majority of the research in the area of steganog-

raphy is concerned with rich media content, such as

images (Cheddad et al., 2010), audio (Jayaram et al.,

2011), video (Liu et al., 2019) and even virtual real-

ity (Wilson, 2019), text steganography remains an ac-

tive area of research encompassing several methods

for the embedding of secret content.

We can classify all steganographic methods in one

of three categories:

• Alteration of Cover Media. In this method, the

cover medium is altered to embed the secret con-

tent. In image-based steganography, the resulting

stego-object is expected to be visually indistin-

guishable from the original cover object and the

resulting image will still look the same as the orig-

inal cover to the normal human eye. Moreover,

in the absence of a mechanism for determining

https://orcid.org/0000-0001-5089-2025

https://orcid.org/0000-0002-6432-8596

https://orcid.org/0000-0001-7578-977X

the originality of the cover image (e.g. by digi-

tally signing it), it is also impossible to determine

which set of pixels represents the original cover

and which other set represents the stego-object.

Despite the fact that the majority of literature so

far on textual steganography uses this method, al-

teration of text remains an insecure method as a

result of the fact that such modiﬁcations can eas-

ily be detected with a suitable level of precise doc-

ument comparison with the original. This is evi-

dent in the lack of discussion of the security of this

method in literature on textual steganography.

• Generation of New Media. In this method, we

generate a new cover altogether with characteris-

tics that match the embedding of our secret mes-

sage. For example, this would be similar to the

capturing of an image using a digital camera,

and where that image has already characteristics

matching our secret message. Another example

would be to generate (e.g. using generative AI

methods) some new text excerpt, again that en-

codes the secret message within.

• Search for a Suitable Media. In this method,

we search among the currently available media

for a cover medium that matches the embedding

of our secret message. In this case, we nei-

ther alter existing cover nor generate a new one,

but simply match with an existing one, drawn

Aziz, B., Makiyah, E. and Bukhelli, A.

An Extended Method for Transmitting Secret Messages in Textual Documents Based on Paragraph Resizing.

DOI: 10.5220/0012716600003767

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 21st International Conference on Security and Cryptography (SECRYPT 2024), pages 389-396

ISBN: 978-989-758-709-2; ISSN: 2184-7711

389

from a database of such media, that exhibits the

right characteristics for our secret message. This

matching cover becomes itself the stego-object for

the communication.

In what follows, we assume that the attacker (the war-

den) has no access to the original text document, as

our method is not immune to visual analysis.

In Section 2, we discuss a few works in literature

related to our paper. In Section 3, we review the the-

oretical background underlying the method proposed

by (Aziz et al., 2022), which forms the basis for our

extended method. In Section 4, we introduce our new

extended method for embedding secret content into

textual documents. We also deﬁne in this section the

extraction process reversing our embedding method.

In Section 5, we demonstrate our extended method

through a simple example. In Section 6, we give a

sketch of how the new method can be implemented

based on a previous mechanism for sharing secrets

that was deﬁned in (Aziz, 2021). Finally, in Section 7,

we conclude the paper with future research directions.

2 RELATED WORK

Recently, there have been several attempts to de-

sign format-based textual steganographic methods for

different languages like English, Persian and Ara-

bic. Format-based steganography is when the phys-

ical features of text symbols are used to conceal a

message. The features are altered in such a manner

that the human eye cannot detect them (Baawi et al.,

2018). For example, lines within the text are moved

up and down to conceal bits of secret data. Simi-

larly, words are moved left or right or up and down.

In some cases, white spaces in-between words or in-

between paragraphs or lines, are used to hide such

data. In feature-based encoding, the physical features

of the words themselves are altered in order to con-

ceal information. This is reliant on actual symbols

in the language being used. Naharuddin et al. (Na-

haruddin et al., 2018) suggested a method that maps

the secret message’s bits onto a cover text using the

American Standard Code for Information Interchange

(ASCII) characters, comprising punctuation, spaces

and symbols. The secret text is initially embedded

using a one-time pad and transformed into a stego-

object. Then, each character is transformed into 7-bit

binary numbers. The embedding procedure is carried

out by mapping one bit of the secret text onto the ﬁrst

bit of the stego-object character comprising the same

quantity of bits. Each bit position for the bit of the se-

cret text is documented as a stego key, which is placed

on the bit of the stego-object. The stego key func-

tions as a key to extract the secret text embedded in

the stego-object.

Maher (Maher, 1995), on the other hand, de-

signed a text data-hiding programme called TEXTO.

TEXTO was designed to transform PGP ASCII-

armored data into English sentences. This method

is convenient for exchanging binary data, especially

embedded data. The secret data here are replaced

by English words, meaning TEXTO works like a

simple substitution cipher, which results in reducing

suspicion over the produced encoded text. Chap-

man and Davida (Chapman and Davida, 1997) in-

troduced a steganographic scheme that consists of

two functions: NICETEXT and SCRAMBLE. NICE-

TEXT transforms a secret message text into a text that

looks like natural language, by parsing the cover text

and extracting syntactical patterns, i.e Part-of-Speech

(PoS) tags. SCRAMBLE does the opposite; it parses

sole words from the generated text and recreates the

encoded text by using codes from the dictionary ta-

ble. Later, Chapman (Chapman et al., 2001) expanded

this approach by using sentence models and large dic-

tionaries of words classiﬁed by PoS tags. By using

the “extensible contextual template” approach, com-

bined with a synonym-based replacement strategy, the

generated text seems more realistic than it is with the

original NICETEXT function.

Attallah et al. (Atallah et al., 2001) proposed a wa-

termarking scheme for natural language text by em-

bedding small portions of the watermark bit strings in

the syntactic structure of a number of selected sen-

tences in the text. This scheme is suitable for ex-

tended meaning texts, such as reports, manuals and

so on, which need integrity protection rather than

secrecy protection as is the case with critical texts

that governments and industries produce in abun-

dance. Later, Attallah (Atallah et al., 2002) also

presented a semantically-based scheme, which im-

proves the information-hiding capacity of any texts

through two techniques: ﬁrst by modifying the gran-

ularity of meaning of individual sentences, and sec-

ond, by dividing the number of sentences affected

by the watermark, which makes it possible to water-

mark short texts too. On the other hand, Moerland

proposed in (Moerland, 2003) a text steganography

technique, which is based on using selected charac-

ters from words. For instance, the ﬁrst letter of all

the paragraphs can be used to conceal the secret mes-

sage. By placing all selected characters together, the

complete secret message can be extracted.

In (Bergmair, 2004), Bergmair presented the lin-

guistic problem of word-sense ambiguity and demon-

strated its relevance to current computer security ap-

plications in the context of Human Interactive Proofs

SECRYPT 2024 - 21st International Conference on Security and Cryptography

390

(HIPs). HIPs enable a machine to automatically deter-

mine whether the machine is interacting with another

machine or with a human. In doing so, Bergmair used

the linguistic anomaly, which states that a word can

have different meanings dependent on the context it

is used in. In the same line of semantic interpretation

of text, Chand and Orgun (Chand and Orgun, 2006)

developed a linguistically robust embedding applica-

tion called LUNABEL, which converts a message into

semantically mundane text. LUNABEL uses word

replacement with substitution classes based on tra-

ditional word replacement features, as well as fea-

tures like semantic criteria and frequency statistics.

LUNABEL creates text, which preserves the syntac-

tic structure and semantic context of the original cover

text. In the same manner, Liang (Liang and Iran-

manesh, 2016) proposed a method by adding ﬁve

white-space characters to random positions in a line

using a key to correlate to the characters required for

embedding secret information. This method is advan-

tageous as randomly spread white-spaces can encode

a message differently using different keys. The white-

spaces contained in the secret text regulate the embed-

ding process.

When it comes to non-English text steganography,

we ﬁnd also plenty of literature, specially in relation

to languages such as Persian and Arabic. For exam-

ple, Shirali-Shaherza (Shirali-Shahreza and Shirali-

Shahreza, 2006) created an application based on hid-

ing binary values in Arabic or Persian scripts using

a feature-coding method. This method depends on

the points inherited in the Arabic or Persian alpha-

bet. The points’ location within the pointed letters

hide information as follows: First, the hidden infor-

mation is translated into a binary value. Then, the

cover text is scanned, and whenever a pointed let-

ter is detected, the location of the point may be af-

fected if hidden binary value is one or zero. The

location of the point is slightly shifted up if the

hidden bit value is one. Otherwise, the location

remains unchanged. In (Shirali-Shahreza, 2008),

Shirali-Shahreza also proposed a text steganogra-

phy technique based on different spellings of words

in British and American English. Some words in

both dialects have different spellings for the same

word; such as ‘colour’ and ‘color’. Furthermore,

in (Shirali-Shahreza and Shirali-Shahreza, 2008),

Shirali-Shahrezaand and Shirali-Shahreza evolved the

previous technique to cater for the different terms for

the same word in British and American English di-

alects, and substituting the text to hide secret data. For

example, the term ‘elevator’ in American English is

referred to as ‘lift’ in British English, and substituing

one for the other facilitates the hiding of one bit.

Baawi et al. (Baawi et al., 2020) suggested a tech-

nique to enhance the embedding capacity for format-

based text steganography using the font as well as

other text characteristics for encoding secret informa-

tion. This technique uses similar symbols for sev-

eral codes, known as Set of High-Frequency Letters

(SHFLs). The embedding process is based on re-

placing English letters with codes that share similar

shapes. One pass encodes two bits, where 00, 01, 10

and 11 conceal glyph1, glyph2, glyph3 and glyph4,

respectively (see Table 1 for an example of four let-

ters, ‘e’, ‘t’, ‘a’ and ‘o’). The steganographic capac-

ity of the table can be enhanced, and the technique

is based on lower-case SHFL. This two-bit technique

outperforms the standard text steganography because

it enhances the embedding capacity of the stego-text.

Table 1: Selected letters in SHFL for the hiding process

(Baawi et al., 2020).

Letters

ASCII Code Unicode

S = 00 S = 01 S = 10 S = 11

e 0065 0023 0026 002A

t 0074 003C 003D 003E

a 0061 005B 005D 005E

o 006F 007B 007C 007D

Most of the new space insertion techniques are in-

spired from the Kashida technique developed by Taha

et al. in (Taha et al., 2020), who suggested a format-

based text steganographic technique intended for the

Arabic language and based on the Kashida and Uni-

code text (including zero-width non-joiner (discrete),

zero-width joiner, little space and zero-width space

along with traditional spaces). The cover text can be

used to conceal one bit of the secret text in each letter

by transforming the letters using their position (i.e.

end, middle or beginning of a word, or an isolated

word). The shapes of the letters are corrected using a

software that changes typographic sequences depend-

ing on letter positioning (i.e. isolated, end, middle or

beginning). Kashida, typed as ‘ ’, represents a char-

acter in the Arabic langauge that extends a letter but

does not change the word meaning.

3 THEORETICAL BACKGROUND

We give a recap here of the theory of the new textual

embedding method that was presented in (Aziz et al.,

2022) for completeness. In (Aziz et al., 2022), the

authors presented a model of a textual document S ∈ S

that assumed the document S consisted of a number of

paragraphs such that S = (P

,...,P

), where S is the

set of all possible such documents. P,P

′

,... ∈ P is

An Extended Method for Transmitting Secret Messages in Textual Documents Based on Paragraph Resizing

391

the set of every possible sound paragraph written in

English, hence S ⊆ P , or in other words, a document

is a ﬁnite sequence of such paragraphs. In addition to

this, a paragraph P was assumed to consist of a ﬁnite

number of (possibly repeating) characters, deﬁned by

the function:

ch of : P → ℘(C )

where C is the multi-set of all possible characters in

English and ℘(C ) is the power-set generated from

C and preserving the multiplicity of each character

in the multi-set, as deﬁned by Axiom V of (Blizard,

1988). Hence:

ch of(P) = {|c : c ∈ P|}

Where we assume that a paragraph has at least one

sentence deﬁned by a punctuation mark, that is a ‘!’

or a ‘?’ or a ‘.’. Additionally, the model of (Aziz et al.,

2022) assumes the condition that no paragraphs have

empty number of characters, i.e.:

∀P ∈ P : |ch of(P)| > 0

A similar condition applies to documents, where no

documents are assumed to be empty, or in other

words, S = ( ) does not exist. Moreover, a stronger

condition is required to hold if a document is to be

used as a cover document, that is

∀S ∈ S : is cover(S) ⇔ |S| > 1

where is cover : S → B is a predicate that asserts

whether a document can be used as a cover for em-

bedding secret messages or not. This last condition

assumes that, unless a document contains two or more

paragraphs, it is neither suitable for the method of

(Aziz et al., 2022) nor to our extended method pro-

posed later.

The main mechanism, which was used in (Aziz

et al., 2022) for embedding secret messages in text

documents was the R : P × P → {0,1} function,

which essentially compares two paragraphs and re-

turns a 0 or a 1 depending on the result of the compar-

ison. It is possible to deﬁne R in any way one prefers,

however, this was deﬁned in (Aziz et al., 2022) as be-

ing a size comparison function on the number of char-

acters in a paragraph:

R(P

) =



0 if |ch of(P

)| ≤ |ch of(P

1 otherwise

which will be the same deﬁnition we will be using

throughout the rest of this paper. Note that we refer

to the parameters of R as the left paragraph (P

) and

the right paragraph (P

), and the choice of instanti-

ating these depends on the speciﬁc case that uses R

(discussed in the next section).

4 THE EXTENDED EMBEDDING

METHOD

Our extended embedding process builds on the same

approach used in (Aziz et al., 2022) for embedding

secret messages M in text documents. However, we

introduce a different method by which M is built:

M = [R(c

),...,R(c

k−1

)]

where, i

,...,i

k−1

∈ N

which gives us a (k − 1)-long secret message. This

deﬁnition introduces an n-wise choice function, c

((p

,..., p

)) = p

A choice function chooses the i

element in a se-

quence. Every c

is a partial function whenever i < 1

or i > n, for an n-long sequence.

In our embedding method, we shall call the se-

quence of pairs of such choice functions, ρ, such that:

ρ = ((c

),...,(c

k−1

))

where there is a requirement that:

1 ≤ i

,...,i

k−1

≤ |S|

for the document S on which the choice functions are

applied.

In order to extract a message, the message receiver

will need to have agreed on the deﬁnition of R above

with the sender of that message, beforehand. Such

deﬁnition of R could in real terms vary, giving there-

fore rise to different variations of this method, each

with its own deﬁnition of R. However, here we stick

with one deﬁnition of R. With this in mind, the ex-

traction logic can be deﬁned as follows:

Y ω (P

,...,P

) ρ [ ] = ω (Y ω) (P

,...,P

) ρ [ ]

where Y is Curry’s ﬁxed-point combinator as deﬁned

in (Curry and Feys, 1958, p.178), (P

,...,P

) is the

text document received by the receiver, ρ is as se-

quence of pairs of choice functions as deﬁned in the

previous section, [ ] is an empty list, which will be

ﬁlled with the bits of the secret message during the

extraction, and ﬁnally, ω is deﬁned as follows:

ω = λ f .λs.λr.λℓ. if r = [ ] then ℓ else f s (r\fst(r))

(ℓ : R(fst(fst(r))(s),snd(fst(r))(s)))

which is a λ expression (Church, 1932) that embeds

the deﬁnition of R. Here, fst : S ⇀ P is a partial

function that returns the ﬁrst paragraph element in a

sequence, snd : S ⇀ P is a partial function that re-

turns the second paragraph element in a sequence and

\ : S × P ⇀ S is a partial function that takes a se-

quence and a paragraph, and returns the sequence re-

sulting from the removal of that paragraph from the

SECRYPT 2024 - 21st International Conference on Security and Cryptography

392

input sequence. Finally, (ℓ : n) = ℓ

′

is an operation

that joins an element n to the tail of an existing list ℓ

such that n becomes the last element of the new list

ℓ

′

. In our case, n = R(fst(fst(r))(s), snd(fst(r))(s)),

which informally, is the bit resulting from the applica-

tion of the R function to two paragraphs, fst(fst(r))(s)

and snd(fst(r))(s). The former is chosen based on the

ﬁrst choice function included in the ﬁrst element of

the received ρ (or r), and the latter is chosen based

on the second choice function included in that ele-

ment of ρ (or r). Both fst and snd are partial functions

since they are not deﬁned over anything other than

pairs (basically, our pairs of choice functions in a ρ

element). \, on the other hand, is partial since the el-

ement being removed from a sequence may not be a

member of that sequence.

5 EXAMPLE

As a simple example, let us consider the 5-paragraph

excerpt in Figure 1 taken from Jules Verne’s “Journey

to the Centre of the Earth”. In its current form, this ex-

cerpt naturally encodes the message M = [1,0,1,1],

given a sequential comparison of subsequent para-

graphs, i.e. ρ = ((c

),(c

)).

5.1 Modes of Selection

In the most general case, there are no conditions on

how ρ is selected. In fact, there are no conditions on

the size of the transmitted message M. However, we

only consider below interesting choices of ρ, where

the size of the transmitted message is at least as large

as the number of paragraphs in the cover text minus

one. We do not consider cases of smaller messages,

since these can be transmitted in text documents with

fewer number of paragraphs, therefore, they do not

represent any new cases.

5.1.1 |M| = |S| − 1: Base Case

This case corresponds to the original embedding al-

gorithm proposed in (Aziz et al., 2022). Here, the

deﬁnition of ρ is restricted by the following format:

ρ = ((c

),...,(c

n−1

))

for a text document of n number of paragraphs, which

represents the comparison of sequential paragraphs’

sizes. In our example of Figure 1, if we wanted to

embed the secret message M

= [0, 0, 1, 0], we will

have to modify the excerpt such that R(P

) = 0,

R(P

) = 0, R(P

) = 1 and R(P

) = 0. An

example of a modiﬁed excerpt embedding this mes-

sage is that of Figure 2, which we refer to as S

Fig2

When extracting this message, we simply apply

the following ﬁxed-point calculation:

Y ω S

Fig2

((c

),(c

)) [ ] =

[0,0,1,0]

5.1.2 |M| = |S| −1: Random Selection Case

In this more generic case, the choice of ρ is only

bounded by one condition: that the size of the embed-

ded message must be one fewer than the number of

paragraphs in the cover text document (i.e. similar to

the size of the message in the base case). Otherwise,

the deﬁnition of ρ may allow for any two paragraphs

to be compared depending on the pre-communication

agreement made by the two communicating entities.

We will discuss in slightly more detail in Section 6

how such pre-communication agreement can be es-

tablished securely. For now, we assume that both enti-

ties know which two paragraphs need to be compared,

for each of the bits in the secret message.

As an example, let us assume that ρ =

((c

),(c

)) and using the mod-

iﬁed excerpt of Figure 2, this will allow us to embed

the following message:

M = [1, 0, 0, 0]

In this case, in order to extract the secret message, we

apply the following ﬁxed-point calculation:

Y ω S

Fig2

((c

),(c

)) [ ] =

[1,0,0,0]

5.1.3 |M| > |S| − 1: Finite but Unbounded Case

In this last case, the number of message bits is ﬁnite

but unbounded, meaning it can be any number of bits,

as long as the two communicating entities have a pre-

agreed length of the secret message. For example,

if we assume that |M| = 8, then we may agree that

the following deﬁnition of ρ consisting of eight pairs

of choice functions, would be used as our embedding

method:

ρ = ((c

),(c

))

Consequently, to embed the following message:

M = [1, 0, 0, 1, 0, 1, 1, 1]

we will need to modify the original excerpt of Fig-

ure 1 to a new excerpt that matches this message and

An Extended Method for Transmitting Secret Messages in Textual Documents Based on Paragraph Resizing

393

P1: Really, what was the good of making such a fuss about an old quarto volume,

the back and sides of which seemed bound in coarse calf|a yellowish old book,

with a faded tassel dangling from it?

P2: However, the professor’s vocabulary of adjectives was not yet exhausted.

P3: "Look!" he said, asking himself questions, and answering them in the same breath;

"is it handsome enough? Yes; it is first-rate. And what binding! Does it open easily?

Yes, it lies open at any page, no matter where. And does it close well? Yes;

for binding and leaves seem in one completely. Not a single breakage in this back after 700 years

of existence! Ah! this is binding that Bozerian, Closs, and Purgold might have been proud of!"

P4: All the while he was speaking, my uncle kept opening and shutting the old book.

I could not do less than ask him about the contents, though I did not feel the least interest

in the subject.

P5: "And what is the title of this wonderful volume?" I asked.

Figure 1: A ﬁve-paragraph excerpt from Jules Verne’s ”Journey to the Centre of the Earth”.

P1: Really, what was the good of making such a fuss about an old quarto volume,

the back and sides of which seemed bound in coarse calf|a yellowish old book,

with a faded tassel dangling from it?

P2: However, the professor’s vocabulary of adjectives was not yet exhausted.

"Look!" he said, asking himself questions, and answering them in the same breath;

"is it handsome enough? Yes; it is first-rate.

P3: And what binding! Does it open easily?

Yes, it lies open at any page, no matter where. And does it close well? Yes; for binding and

leaves seem in one completely. Not a single breakage in this back after 700 years of existence!

Ah! this is binding that Bozerian, Closs, and Purgold might have been proud of!"

P4: All the while he was speaking, my uncle kept opening and shutting the old book.

P5: I could not do less than ask him about the contents, though I did not feel the least interest

in the subject. "And what is the title of this wonderful volume?" I asked.

Figure 2: The modiﬁed excerpt, S

Fig2

P1: Really, what was the good of making such a fuss about an old quarto volume,

the back and sides of which seemed bound in coarse calf|a yellowish old book,

with a faded tassel dangling from it? However, the professor’s vocabulary of adjectives

was not yet exhausted.

P2: "Look!" he said, asking himself questions, and answering them in the same breath;

"is it handsome enough? Yes; it is first-rate. And what binding! Does it open easily?

Yes, it lies open at any page, no matter where. And does it close well? Yes;

for binding and leaves seem in one completely. Not a single breakage in this back after 700

years of existence!

P3: Ah! this is binding that Bozerian, Closs, and Purgold might have been proud of!"

P4: All the while he was speaking, my uncle kept opening and shutting the old book.

P5: I could not do less than ask him about the contents, though I did not feel the least interest

in the subject. "And what is the title of this wonderful volume?" I asked.

Figure 3: The modiﬁed excerpt, S

Fig3

SECRYPT 2024 - 21st International Conference on Security and Cryptography

394

the choice of ρ above. An example of such (suitably)

modiﬁed excerpt could be that shown in Figure 3. We

call this second modiﬁed excerpt, S

Fig3

Now, in order to extract the secret message for

this case, we again apply our ﬁxed-point calculation:

Y ω S

Fig3

((c

),(c

)) [ ] = [1, 0, 0, 1, 0, 1, 1,1]

6 IMPLEMENTATION USING

SHARED SECRET KEYS

A new and important usage of Difﬁe-Hellman shared

secret keys (Difﬁe and Hellman, 1976) was demon-

strated in (Aziz, 2021) as a means of agreement be-

tween two entities on the semantic interpretation of

the exchanged messages. This approach opens the

possibility, in our case, to achieve a sort of separation

of concerns, between deﬁning ρ and sharing a secret

key that would allow both communicating entities to

agree on ρ, before the commencement of the commu-

nication session.

Assuming ι is an index function deﬁned as:

ι : N → ((Ξ × Ξ) × ... × (Ξ × Ξ))

where Ξ = {c,c

′

,...} is the set of all possible choice

functions, then we can simply retrieve a speciﬁc deﬁ-

nition of ρ by applying:

ι(K

) = ρ

is the Difﬁe-Hellman key pre-agreed between A

and B. A generic deﬁnition of ι was given in (Aziz,

2021) as an indexing function that can be used to re-

trieve a semantic domain (ρ in our case).

We can also modify the algorithm given in (Aziz,

2021) for agreeing a semantic domain by introducing

a new albeit slightly modiﬁed algorithm as shown in

Algorithm 1 below. The net outcome of calling the

ALICE procedure will be that ρ

= ρ

As a result, both ALICE and BOB will end up

agreeing on the same ρ value to use in the modiﬁ-

cation of textual documents in the subsequent com-

munication session(s).

7 CONCLUSION

This paper has presented an overview of recent pro-

cedures for hiding secret messages in text docu-

ments through steganography. Both format-based

techniques modifying text attributes and linguistic

methods producing credible language covers were re-

1: procedure ALICE

2: Choose some a ∈ N

3: Compute K = BOB(c

)

mod p

▷ where c

= g

mod p

4: Set in internal state ρ

=: ι(K)

5: return

6: end procedure

8: procedure BOB(c

)

9: Choose some ρ ∈ (Ξ × Ξ) × ... ×(Ξ × Ξ)

10: Choose a speciﬁc b ∈ N such that

ι(c

mod p) = ρ

11: Set in internal state that ρ

= ρ

12: return c

▷ where c

= g

mod p

13: end procedure

Algorithm 1: A modiﬁed DH-based ρ Agreement Algo-

rithm [p,g,ι are global parameters, ρ

,ρ

are local state

variables].

viewed. However, limitations were recognised with

the capacity and security of the existing approaches.

To address these concerns, a novel extended em-

bedding method is proposed based on encoding mes-

sages by adjusting the size of comparative paragraph.

The choice of the paragraphs being compared is al-

lowed to be ﬂexible, unlike the original sequential

comparison method proposed in (Aziz et al., 2022).

The theory underlying our approach is formally ex-

pressed using the λ-calculus, and method for agree-

ing the choice of paragraph functions is also proposed

by an algorithm based on the Difﬁe-Hellman protocol

(Difﬁe and Hellman, 1976). We hypothsise that this

this new technique will provide increased hiding ca-

pacity and resilience against statistical attacks by ex-

ploiting structural properties of the textual document.

Future work will be focused on exploring a num-

ber of research directions: First, we intend to ex-

plore substitute deﬁnitions of the paragraph compari-

son function R to deﬁne optimal conﬁgurations of the

cover text. Second, it would be interesting to deﬁne

a search algorithm that searches for text documents

with the most suitable features, such that both the hid-

ing capacity as well as the security of the stego-object

are maximised. Extending the approach to other me-

dia formats, such as images, video and audio would

also be possible, by adopting a block comparison ap-

proach, where parts of the image or video ﬁle are

compared against other parts as a means of hiding a

bit of information. Finally, we propose that recent

generative artiﬁcial intelligence methods may be use-

ful in generating natural language cover documents

with suitable characteristics, or even generating the

actual stego-object documents.

An Extended Method for Transmitting Secret Messages in Textual Documents Based on Paragraph Resizing

395

REFERENCES

Atallah, M. J., Raskin, V., Crogan, M., Hempelmann, C.,

Kerschbaum, F., Mohamed, D., and Naik, S. (2001).

Natural language watermarking: Design, analysis,

and a proof-of-concept implementation. In Proceed-

ings of the 4th International Workshop on Information

Hiding, IHW ’01, page 185–199, Berlin, Heidelberg.

Springer-Verlag.

Atallah, M. J., Raskin, V., Hempelmann, C. F., Karahan,

M., Sion, R., Topkara, U., and Triezenberg, K. E.

(2002). Natural language watermarking and tamper-

prooﬁng. In International workshop on information

hiding, pages 196–212. Springer.

Aziz, B. (2021). A note on the problem of semantic in-

terpretation agreement in steganographic communica-

tions. Journal of Internet Services and Information

Security, 11(3):47–57.

Aziz, B., Bukhelli, A., Khusainov, R., and Mohasseb, A.

(2022). A novel method for embedding and extract-

ing secret messages in textual documents based on

paragraph resizing. In Proceedings of the 19th In-

ternational Conference on Security and Cryptography

- Volume 1: SECRYPT,, pages 714–719. INSTICC,

SciTePress.

Baawi, S. S., Mokhtar, M. R., and Sulaiman, R.

(2018). A comparative study on the advancement

of text steganography techniques in digital media.

ARPN Journal of Engineering and Applied Sciences,

13(5):1854–1863.

Baawi, S. S., Nasrawi, D. A., and Abdulameer, L. T. (2020).

Improvement of ”text steganography based on uni-

code of characters in multilingual” by custom font

with special properties. In IOP Conference Series:

Materials Science and Engineering, volume 870.

Bergmair, R. (2004). Towards linguistic steganography: A

systematic investigation of approaches, systems, and

issues. Final year thesis, B. Sc.(Hons.) in Computer

Studies, The University of Derby.

Blizard, W. D. (1988). Multiset theory. Notre Dame Journal

of Formal Logic, 30(1):36 – 66.

Chand, V. and Orgun, C. (2006). Exploiting linguistic fea-

tures in lexical steganography: Design and proof-of-

concept implementation. In Proceedings of the 39th

Annual Hawaii International Conference on System

Sciences (HICSS’06), volume 6, pages 126b–126b.

Chapman, M. and Davida, G. (1997). Hiding the hidden: A

software system for concealing ciphertext as innocu-

ous text. In International Conference on Informa-

tion and Communications Security, pages 335–345.

Springer.

Chapman, M., Davida, G. I., and Rennhard, M. (2001).

A practical and effective approach to large-scale au-

tomated linguistic steganography. In International

Conference on Information Security, pages 156–165.

Springer.

Cheddad, A., Condell, J., Curran, K., and Mc Kevitt,

P. (2010). Digital image steganography: Survey

and analysis of current methods. Signal processing,

90(3):727–752.

Church, A. (1932). A set of postulates for the foundation of

logic. Annals of Mathematics, 33(2):346–366.

Curry, H. B. and Feys, R. (1958). Combinatory Logic.

Number v. 1 in Combinatory Logic. North-Holland

Publishing Company.

Difﬁe, W. and Hellman, M. E. (1976). New directions in

cryptography. IEEE Transactions on Information The-

ory, IT-22(6):644–654.

Jayaram, P., Ranganatha, H., and Anupama, H. (2011). In-

formation hiding using audio steganography–a survey.

The International Journal of Multimedia & Its Appli-

cations (IJMA) Vol, 3:86–96.

Liang, O. W. and Iranmanesh, V. (2016). Information hid-

ing using whitespace technique in Microsoft word. In

Proceedings of the 2016 International Conference on

Virtual Systems and Multimedia, VSMM 2016. Insti-

tute of Electrical and Electronics Engineers Inc.

Liu, Y., Liu, S., Wang, Y., Zhao, H., and Liu, S. (2019).

Video steganography: A review. Neurocomputing,

335:238–250.

Maher, K. (1995). Texto. URL:

ftp://ftp.funet.ﬁ/pub/crypt/steganography/texto.

tar. gz.

Moerland, T. (2003). Steganography and steganalysis. Lei-

den Institute of Advanced Computing Science.

Naharuddin, A., Wibawa, A. D., and Sumpeno, S. (2018).

A high capacity and imperceptible text steganography

using binary digit mapping on ascii characters. In Pro-

ceeding - 2018 International Seminar on Intelligent

Technology and Its Application, ISITIA 2018, pages

287–292. Institute of Electrical and Electronics Engi-

neers Inc.

Shirali-Shahreza, M. (2008). Text steganography by chang-

ing words spelling. In 2008 10th International Confer-

ence on Advanced Communication Technology, vol-

ume 3, pages 1912–1913. IEEE.

Shirali-Shahreza, M. H. and Shirali-Shahreza, M. (2006).

A new approach to persian/arabic text steganography.

In 5th IEEE/ACIS International Conference on Com-

puter and Information Science and 1st IEEE/ACIS In-

ternational Workshop on Component-Based Software

Engineering,Software Architecture and Reuse (ICIS-

COMSAR’06), volume 2006, pages 310–315.

Shirali-Shahreza, M. H. and Shirali-Shahreza, M. (2008).

A new synonym text steganography. In 2008 Interna-

tional Conference on Intelligent Information Hiding

and Multimedia Signal Processing, pages 1524–1526.

IEEE.

Taha, A., Hammad, A. S., and Selim, M. M. (2020). A

high capacity algorithm for information hiding in ara-

bic text. J. King Saud Univ. Comput. Inf. Sci., 32:658–

665.

Wilson, S. (2019). Unreal steganography: Us-

ing a vr application as a steganography car-

rier. https://www.forensicfocus.com/stable/wp-

content/uploads/2019/07/dissertation.pdf.

SECRYPT 2024 - 21st International Conference on Security and Cryptography

396