techniques. Most simple techniques hide information
on the least significant bits (LSB) of each pixel. An-
other widely used cover are digital audio files. Au-
dio steganography also includes techniques such as
LSB (similar to image LSB steganography). Audio
steganography can be performed also in compressed
audio files like MP3s. Some tools like MP3Stego (Pe-
titcolas, 2006) can hide information during the inner
loop step, by modifiying the DCT values.
More steganographic techniques can be found in
the literature, including subliminal channels (Sim-
mons, 1996), SMS (Mohammad Shirali-Shahreza,
2007), TCP/IP packets (Murdoch and Lewis, 2005),
executable files (El-Khalil, 2003) and games (Castro
et al., 2006).
3 HIDING SOURCE CODE
We have developed a stego-system (Csteg) based on
context-free grammars. Our design allows transform-
ing a source code file into a plaintext file (stego-text).
A grammar describing the source code structure is
used to produce the stego-texts. The stego-text gen-
erated has no export restrictions
2
. Stego-text can be
recovered at its destiny applying the reverse grammar.
Recovered source code keeps all the functionality of
the original source code.
Other text steganography systems based on
context-free grammars have been developed in the
past: Spammimic
3
can convert a short text message
into an email Spam message. Another tool is C to En-
glish to C (Schwarz, 2001) which translates C source
code to the English explanation of it. In 2001, Mart-
tila (Marttila, 2001) conceived a system to hide source
code inside a text.
Martilla designed a tool called c2txt2c that, by using
context-free grammars, was able to produce an En-
glish text from the source code of the Blowfish ci-
pher. Martilla’s c2txt2c is very limited as it was not
able to hide another cryptographic algorithm except
for Blowfish. We have continued Marttilia’s research
and designed and implemented a tool that hides con-
sistently any source code into plain text. On the other
hand, using context-free grammars to produce stego-
text has its drawbacks. Producing meaningful texts is
difficult and, additionally, the used grammar should
be able to parse any source code provided. Finally,
the use of the same grammar to hide all the input files
may generate similar stego-texts which might ease an
attack. To improve the security of the system it could
2
at least in the reviewed legislations
3
http://www.spammimic.com
be advantegeous to have the possibility of generating
very different stego-texts, even from the same source
file.
Our system uses a plain text file to produce
context-free grammars. The aim is to be able to pro-
duce different grammars just by changing the input
file. This will also help to build meaningful stego-
texts. These plaintext files should have special char-
acteristics.
• The content of the text file should have sense and
meaning.
• Text file should have the maximum possible
length. Our system will extract portions of the file
to generate the grammar.
• Files used by the system should not be restricted
by any copyright.
To perform the recovery process, our stego-system
will need the stego-text and the same input text file
that was used to hide the source code (Figure 1). In
this case a reverse grammar will be generated, pro-
ducing the original source code as output. The text file
used to hide and restore the source code can be con-
sidered as the key of the stego-system, as it is needed
to restore the source code and a different file will pro-
duce different stego-texts.
Figure 1: Stego-system scheme.
3.1 Grammar Generation
Grammar generation can be divided in two different
steps. First step builds the grammar to hide the source
code. Second step builds the grammar to restore the
source code (recovery grammar). Both grammars
must be generated with the same cover-text as input.
3.1.1 Creating the Hiding Grammar
The first step in creating the hiding grammar is to
read the source code files. The resulting grammar
is closely related to the programming language to be
hidden. The cover-text file is used to generate the
output of the parsing process. Output for each rule
(P) of the grammar (G) is extracted from the cover-
text file (Figure 2). To avoid conflicts in the recovery
process, fragments ( f
i
) used to produce output cannot
be repeated. Each of the fragments can be attached
SECRYPT 2008 - International Conference on Security and Cryptography
400