word might be “habitación” (room) (bit 0), "había”
(had) (bit 10), “limpió” (cleaned) (bit 11) depending
on the bits to be concealed. The receiver would
construct the frequency table as the transmitter and
would relate the words in the stegotext received with
the generated table. The receiver would know that
the words "La habitación", “La había” or “La
limpió” hidden one bit (0) or two (10-11)
respectively.
The receiver might operate not considering the
words that are read until one finds one of the
possible words at the level of the linked list where
we are. This small not published detail, permits the a
posteriori manual improvement of generated
stegotexts by the transmitter without involving the
receiver. The only condition is that the transmitter
can use any word that one cannot find in the
following level so that the receiver does not lose
synchrony. That is to say, in the previous example
among the words that conform the pairs "La
habitación”, “La había”, “La limpió”, one might use
any word that was not “habitación”, “había” or
“limpió”. This way, the transmitter can correct
possible grammatical mistakes and improve the
coherence of the text without any need that the
receiver has this information. For example, if we
conceal a small information of 126 bits, using an
order 9 and Antonio Machado's complete poetry as
training text we would obtain among the possible
stegotexts one like the following one:
[…] suena el rebato de la tarde en la arboleda!
Mientras el corazón pesado. El agua en sombra
pasaba tan melancólicamente, bajo los arcos del
puente al ímpetu del río sus pétreos tajamares; […]
Figure 2: Stegotext that conceals 126 bits. Order 9. Clave:
Alfonso. Training source Antonio Machado's complete
poetry.
As it can be observed, this stegotext has a series
of small grammatical mistakes (we leave aside
problems of global coherence), for example in the
fragment“la tarde en la arboleda! Mientras” (the
evening in the grove! While[…]), a word with a
capital letter that is not preceded by a full stop and a
question mark closed but not opened before. To
solve these problems the Stelin tool generates a
template with the possible words in every level, so
that the transmitter could select the words to be
added among the words of the stegotext, words that
will be rejected by the receiver. For example, we
select the fragment: “de la tarde en la arboleda!
Mientras el corazón” (of the evening in the grove!
While the heart"). Let's see an example of the
generated template:
[WORD:en][muerta][flota][.][bella][roja][en][,]
[sobre][arrebolada][y]
[WORD:la][sus][la]
[WORD:arboleda] [arboleda]
[WORD:!] [!]
[WORD:Mientras] [Mientras]
Bearing this in mind we edit the fragment.
Among the multiple possible options we choose the
following one:
“tarde en la dulce arboleda, ¡qué sensación!.
Mientras el corazón” (late in the sweet grove, what
a sensation!. While the heart)
This so simple way the quality of the generated
stegotext can be improved substantially, for
example, solving problems derived from punctuation
marks or others. The modification of the stegotexts
by this procedure needs a bit of practice. Thankfully
in the texts in natural language in Spanish (and in
other languages) the words have certain positions,
that is to say, it is more probable that certain words
appear after others (Zipf law) and it is more probable
that more words appear after some others. In
general, there would be few words after which it
would be difficult to choose a new word because
there are many options in the level of the
corresponding table, and many words after which
few options will exist so there will be easier to add
new words. For example, in texts in Spanish
language words as de, la que, el, en, y, a, los, etc,
are more probable, then it is more probable that
there are more words that can appear with these. If
we concentrate on the previous example “de la tarde
en la arboleda! Mientras el corazón” (of the evening
in the grove! While the heart ") it would turn out to
be trivial to add information before the article “el”
(the) but more difficult to find words after this word
and before the word “corazón” (heart). Bearing this
principles in mind can help to reduce the time and
the positions where it is better to work to correct the
possible grammatical mistakes. So it is possible to
automatically generate stegotexts, at least in Spanish
language, and to correct the mistakes manually
without involving the receiver and obtaining
stegotexts with a quality more than average.
Nowadays the concealment of information with
the Stelin tool has a capacity of approximately 1
WORD-2 BIT (it depends on the training text and on
the information to be concealed), this capacity of
concealment diminishes depending on the number of
words that are added by means of manual edition.
To conceal information over a hundred bits the
generated stegotexts will be of large size (it depends
on the size and on the "quality" of the training texts)
and therefore the manual edition will take a
IMPROVING N-GRAM LINGUISTIC STEGANOGRAPHY BASED ON TEMPLATES
211