Watermarking PDF Documents using Various Representations of

Self-inverting Permutations

Maria Chroni and Stavros D. Nikolopoulos

Department of Computer Science & Engineering, University of Ioannina, GR-45110 Ioannina, Greece

Keywords:

Watermarking Techniques, Text Watermarking Algorithms, PDF Documents, Self-inverting Permutations,

Representations of Permutations, Embedding Algorithms, Extracting Algorithms.

Abstract:

Portable Document Format (PDF) documents are extensively used over the internet for information exchange

and due to the ease of copying and distributing they are susceptible to threats like illegal copying, redistri-

bution, and plagiarism. This work provides to web users copyright protection of their PDF documents by

proposing efﬁcient and easily implementable techniques for PDF watermarking; our techniques are based

on the ideas of our recently proposed watermarking techniques for software, image, and audio, expanding

thus the digital objects that can be efﬁciently watermarked through the use of self-inverting permutations. In

particular, we present various representations of a self-inverting permutation π

∗

namely 1D-representation,

2D-representation, and RPG-representation, and show that theses representations can be efﬁciently applied

to PDF watermarking. Indeed, we ﬁrst present an audio-based technique for marking a PDF document T by

exploiting the 1D-representation of a permutation π

∗

, and then, since pages of a PDF document T are 2D

objects, we present an image-based algorithm for encoding π

∗

into T by ﬁrst mapping the elements of π

∗

into

a matrix A

∗

and then using the information stored in A

∗

to mark invisibly speciﬁc areas of PDF document

T. Finally, we describe a graph-based watermarking algorithm for embedding a self-inverting permutation π

∗

into the document structure of a PDF ﬁle T by exploiting the RPG-representation of π

∗

and the structure of a

PDF document. We have evaluated the embedding and extracting algorithms by testing them on various and

different in characteristics PDF documents.

1 INTRODUCTION

Information age has altered the way people communi-

cate by breaking the barriers imposed on communica-

tions by time, distance, and location and has undoubt-

edly impact not only human activities but also global

industry and economy.

An electronic document is an extensively used

medium traveling overthe internet for information ex-

change and due to the ease of copying and distribut-

ing they are susceptible to threats like illegal copying,

redistribution of copyrighted documents, and plagia-

rism. Subsequently, it has become more important to

protect the electronic documents from any malicious

user while existing in the digital world. Copyright

protection of digital contents is such a need of time

which cannot be overlooked. In past, various methods

like encryption, steganography and watermarking has

been used to solve these problems. However, digital

watermarking is the better solution for copyright pro-

tection than encryption and steganography. It is well

known that digital watermarking methods are efﬁcient

enough to identify the original copyright owner of the

contents. Note that there are many reasons why you

would want to use watermarks in digital documents:

as a copying deterrent, as a means of identifying the

source of a printed document, as a means of determin-

ing whether a document has been altered, etc.

Any action that a user can perform on a text that

can affect the watermark, or its usefulness, is called

attack. In (Zhou et al., 2009) existing attacks on text

watermarking can be classiﬁed into three main cat-

egories: watermark attacks, geometric attacks, and

system attacks (Collberg and Nagra, 2010).

Text watermarking is the area of research that has

emerged after the development of internet and com-

munication technologies; we mention that the ﬁrst

reported effort on marking documents dates back to

1993. Previous work on digital text watermarking

is based on several techniques among which image-

based approach (Brassil et al., 1995; Huang and Yan,

2001; Low et al., 1998; Low and Maxemchuk, 2000;

Maxemchuk and Low, 1997; Maxemchuk and Low,

1998), syntactic approach (Atallah et al., 2003; Meral

Chroni M. and Nikolopoulos S..

Watermarking PDF Documents using Various Representations of Self-inverting Permutations.

DOI: 10.5220/0005445800730080

In Proceedings of the 11th International Conference on Web Information Systems and Technologies (WEBIST-2015), pages 73-80

ISBN: 978-989-758-106-9

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

et al., 2009), and semantic approach (Lu et al., 2008;

Vybornovaand Macq, 2007; Topkara et al., 2007; Sun

and Asiimwe, 2005). Recently, a signiﬁcant num-

ber of techniques have been proposed in the literature

which use Portable Document Format (PDF) ﬁles as

cover media in order to hide data (Bindra, 2011; Liu

et al., 2012; Liu et al., 2008; Liu et al., 2006; Lee and

Tsai, 2010; Zhong et al., 2007).

In this paper, in order to provide to web users

present techniques for watermarking PDF docu-

ments by exploiting several representations of a self-

inverting permutation π

∗

, i.e. the 1D-representation,

the 2D-representation, and the RPG-representation.

Our main contribution is a graph-based watermarking

algorithm for embedding a self-inverting permutation

∗

into the document structure of a PDF ﬁle T using

the RPG-representation of π

∗

and the structure of a

PDF document.

2 BACKGROUND RESULTS

In this section we give some deﬁnitions and the theo-

retical background we use towards the watermarking

of Portable Document Format (PDF) documents.

Self-inverting Permutations. Let π be a permutation

over the set N

= {1, 2, ...,n}. We think of permu-

tation π as a sequence (π

,π

,...,π

), so, for exam-

ple, the permutation π = (1, 4, 2, 7, 5, 3, 6) has π

= 1,

= 4, etc (Golumbic, 1980).

The inverse of π is the permutation τ =

(τ

,τ

,...,τ

) with τ

= π

= i. Clearly, every per-

mutation has a unique inverse, and the inverse of the

inverse is the original permutation.

A self-inverting permutation (or, for short, SiP) is

a permutation π = (π

,π

,...,π

) that is its own in-

verse, i.e., π

= i, for i = 1,2,.. .,n.

The deﬁnition of the inverse of a permutation im-

plies that a permutation is a self-invertingpermutation

iff all its cycles are of length 1 or 2.

1D-representation of SiP. In our 1D-representation

(Chroni et al., 2014), the elements of the permutation

π are mapped in speciﬁc cells of an array B of size n

as follows:

• number π

−→ entry B((π

−1

− 1)n+ π

)

or, equivalently, the cell at the position (i− 1)n+ π

labeled by the number π

, for each i = 1, 2, ..., n.

In our 1DM representation, a permutation π over

the set N

is represented by an n

array B

∗

where the

cells at positions (i− 1)n+π

are marked by a speciﬁc

symbol, say, the asterisk character “*”.

2D-representation of SiP. In (Chroni et al., 2013),

we have deﬁned the 2D-representation of a SiP as the

representation where the elements of the permutation

π = (π

,π

,. .. , π

) are mapped in speciﬁc cells of an

n× n matrix A as follows:

• number π

−→ entry A(π

−1

,π

)

or, equivalently, the cell at row i and column π

is la-

beled by the number π

, for each i = 1, 2, ..., n.

In 2DM-representation the cell at row i and col-

umn π

of matrix A is marked by a speciﬁc symbol,

for each i = 1, 2, . ..,n.

RPG-representation of SiP. We have also pre-

sented an efﬁcient and easily implemented algo-

rithm for encoding numbers as reducible permuta-

tion graphs (or, for short, RPG) through the use of

self-inverting permutations (Chroni and Nikolopou-

los, 2012). In particular, we have proposed the algo-

rithm Encode

SiP.to.RPG which applies to any per-

mutation π and relies on domination relations on the

elements of π.

Figure 1 summarizes by an example the represen-

tations of the permutation π

∗

= (4,7,6,1,5,3,2).

2.1 Structure of a PDF Documents

The Portable Document Format (PDF) (Adobe, 2006)

is an open standard (deﬁned in ISO 32000) which fa-

cilitates device and platform independent capture and

representation of rich information such as text, multi-

media and graphics, into a single medium. Thus the

PDF format enables viewing and printing of a rich

document, independent of either application software

or hardware. In this section we present a structural

analysis of a PDF ﬁle and give its basic components.

Object. An object is the basic element in PDF ﬁles,

in which eight kinds of objects, namely Boolean, Nu-

meric, String, Name, Array, Null, Dictionary and

Stream are sustained. Objects may be labeled so that

they can be referred to by other objects. A labeled

object is called an indirect object.

File Structure. The PDF ﬁle structure determines

how objects are stored in a PDF ﬁle, how they are ac-

cessed, and how they are updated. The ﬁle structure

(see, Figure 2) includes the following:

• an one-line header identifying the version of the

PDF speciﬁcation to which the ﬁle conforms,

• a body containing the objects that make up the

document contained in the ﬁle,

WEBIST2015-11thInternationalConferenceonWebInformationSystemsandTechnologies

∗

= (4, 7, 6, 1, 5, 3, 2)

4 3

The watermark number w = 4

3 4

14 15

. . .

36 37

38 39

22 23

24 25

. . .

1D-representation of π

∗

2D-representation of π

∗

Reducible Permutation Graph F [π

∗

]

↓

Figure 1: Three different representations of permutation π

∗

= (4, 7,6,1, 5,3,2).

• a cross-reference table containing information

about the indirect objects in the ﬁle, and

• a trailer giving the location of the cross-reference

table and of certain special objects within the

body of the ﬁle.

Figure 2 shows an example of a PDF ﬁle and its inter-

nal ﬁle structure.

Document Structure. The PDF document structure

speciﬁes how the basic object types are used to rep-

resent components of a PDF document: pages, fonts,

annotations, and so forth. The document structure of

a PDF ﬁle is organized in the shape of an object tree

topped by Catalog, Page tree, Outline hierarchy and

Article thread included. The Outline hierarchy is the

bookmarker of PDF, and Page tree includes page and

Pages which in turn includes the total page number

and each page marker. Page, the main body of PDF

ﬁle, is the most important object which involves the

typeface applied, the text, pictures, page size, etc.

3 WATERMARKING PDF

DOCUMENTS

In this section we describe embedding algorithms for

encoding a SiP π

∗

into a digital document T. More

speciﬁcally, we embed the permutation π

∗

into a PDF

document by exploiting the 1D, the 2D, and the RPG-

representation of the permutation π

∗

3.1 Embed Watermark into PDF - I

Our embedding algorithm watermarks a PDF docu-

ment by exploiting the 1D-representation of π

∗

; the

marking is performed by increasing the space be-

tween two consecutive words in a paragraph of T.

Let B

∗

be the 1D array of size n = n

∗

× n

∗

which

represents the permutation π

∗

of length n

∗

, and let

, s

), (w

, s

), .. ., (w

, s

) be n pairs of type

“word-space” of a paragraph par of the input PDF

document; recall that the entry B

∗

((i−1)n

∗

+π

∗

) con-

tains the symbol “*”, 1 ≤ i ≤ n

∗

. The algorithm in-

creases by a small value “c” the i-th space of the pair

, s

) if B

∗

((i− 1)n

∗

+ π

∗

) = “ ∗ ”; our embedding

algorithm works as follows:

Algorithm Embed

SiP.to.PDF-I

1. Compute the 1DM representation of the permu-

tation π

∗

, i.e., construct the array B

∗

of size n =

∗

× n

∗

where the (i− 1)n

∗

+ π

∗

entry of B

∗

con-

tains the symbol “*”, 1 ≤ i ≤ n

∗

;

2. Select an appropriate paragraph par on a page P

of PDF document T to embed the self-inverting

WatermarkingPDFDocumentsusingVariousRepresentationsofSelf-invertingPermutations

%PDF-1.1

1 0 obj

<< /Type /Catalog /Outlines 2 0 R /Pages 3 0 R >> endobj

2 0 obj

<< /Type /Outlines /Count 0 >> endobj

3 0 obj

<< /Type /Pages /Kids [4 0 R] /Count 1 >> endobj

4 0 obj

<< /Type /Page /Parent 3 0 R /MediaBox [0 0 612 792] /Contents 5

0 R /Resources << /ProcSet 6 0 R /Font << /F1 7 0 R>> >> >>

endobj

5 0 obj

<< /Length 48 >>

stream

/F1 24 Tf

100 700 Td

(Hello World)Tj

endstream

endobj

6 0 obj

[/PDF /Text] endobj

7 0 obj

<< /Type /Font /Subtype /Type1 /Name /F1 /BaseFont /Helvetica

/Encoding /MacRomanEncoding >> endobj

xref

0 8

0000000000 65535 f

0000000012 00000 n

0000000089 00000 n

0000000145 00000 n

0000000214 00000 n

0000000381 00000 n

0000000485 00000 n

0000000518 00000 n

trailer

/Size 8

/Root 1 0 R

startxref

642

Header

Body

Cross-Ref

Table

Trailer

Figure 2: The structure of a PDF ﬁle along with its code

containing, in object 5 0 obj, the text “Hello World”.

permutation π

∗

;

3. Partition the paragraph par into n pairs

),(w

),. .. ,(w

), where w

and

are the i-th word and space, respectively, in

selected paragraph par, 1 ≤ i ≤ n;

4. For each pair (w

) s.t. B

∗

((i−1)n

∗

+π

∗

) = “∗”,

increases the space s

or, equivalently, distance

d(w

i+1

) between words w

and w

i+1

, by a rel-

ative small value c, 1 ≤ i ≤ n;

5. Return the watermarked PDF document T

Extraction. The extraction algorithm, which we call

Extract

PDF.from.SiP-I, operates as follow: it

takes as input the watermarked PDF document T

, lo-

cates the paragraph par, and computes the permuta-

tion π

∗

by ﬁnding the positions of the words w

such

that:

◦ d(w

i+1

) > d(w

i−1

), or

◦ d(w

i+1

) > d(w

i+1

i+2

)

where, d(w

) is the distance between words w

and

in a paragraph par of T

, 1 ≤ i ≤ n; note that,

an appropriate paragraph par contains more that n

words.

3.2 Embed Watermark into PDF - II

In this section we describe an algorithm of embed-

ding a self-inverting permutation π

∗

into a digital doc-

ument T by exploiting the two-dimensional represen-

tation of permutation π

∗

The main idea behind the embedding algorithm,

which we call Embed SiP.to.PDF-II, is similar

of that of algorithm Embed SiP.to.Image-F; see,

(Chroni et al., 2013). The most important of this idea

is the fact that it suggests a way in which the permuta-

tion π

∗

can be represented by a 2D-matrix and, since

pages of a PDF document T are two dimensional ob-

jects, such a representation can be efﬁciently used for

embedding π

∗

into T resulting thus the watermarked

PDF document T

; in a similar way as in our image

watermarking approach, such a 2D-representation can

be efﬁciently extracted for a watermarked PDF docu-

ment T

and converted back to the self-inverting per-

mutation π

∗

Let A

∗

be the 2D-matrix of size n

∗

×n

∗

which rep-

resents the permutation π

∗

of length n

∗

. The mark-

ing of the input PDF document T is performed by

selecting an appropriate page P of T and setting n

∗

objects (e.g., characters, symbols, images) in a spe-

ciﬁc positions on page P, 1 ≤ i ≤ n

∗

. In fact, we set

an object O

in position with (x

′

) coordinates on

page P if A

∗

) = “∗ ”, where 1 ≤ x

≤ n

∗

and

0 ≤ x

′

≤ size(P); note that, (0,0) is the lower-left

point (or, equivalently, the bottom-left corner) of the

page P.

The algorithm takes as input a SiP π

∗

and a PDF

document T, and returns the watermarked PDF docu-

ment T

; it consists of the following steps.

Algorithm Embed

SiP.to.PDF-II

1. Compute the 2DM representation of the self-

inverting permutation π

∗

, i.e., construct an array

∗

of size n

∗

× n

∗

s.t. the entry A

∗

(i,π

∗

) contains

the symbol “*”, 1 ≤ i ≤ n

∗

;

2. Select an appropriate page P to embed the permu-

tation π

∗

and compute the size size(P) of the page

P, say, N × M;

3. Segment the PDF page P into n

∗

× n

∗

grid-cells

of size



∗





∗



, 1 ≤ i, j ≤ n

∗

;

WEBIST2015-11thInternationalConferenceonWebInformationSystemsandTechnologies

4. For each grid-cellC

s.t. A

∗

(i, j) = “∗ ”, mark the

cell C

by setting a symbol, with an appropriate

color, in any position insideC

of P, 1 ≤ i, j ≤ n

∗

resulting thus the marked document T

;

5. Return the watermarked PDF document T

Extraction. The algorithm which extracts the per-

mutation π

∗

from the watermarked PDF T

operates

in a similar way as the corresponding extraction al-

gorithm for images: it takes the input watermarked

image I

, locate the marked page P, computes its

N × M size, and segments P into n

∗

× n

∗

grid-cells

of size



∗





∗



; then, it computes the per-

mutation π

∗

by ﬁnding the coordinates (x

) of the

∗

symbols in the page P, 1 ≤ i ≤ n

∗

; we call it

Extract

PDF.from.SiP-II.

3.3 Embed an RPG into a PDF

We next describe a watermarking algorithm for em-

bedding a self-inverting permutation π

∗

into a PDF

document T, by exploiting the RPG-representation of

∗

and the structure of a PDF document T.

Indeed, we have recently proposed an algorithm,

namely Encode

SiP.to.RPG (Chroni and Nikolopou-

los, 2012), for encoding SiPs π

∗

as reducible permu-

tation graphs F[π

∗

]. Moreover, in this paper we have

described the document structure DS(T) of a PDF

document T (see, Subsection 2.1); note that, the doc-

ument structure of a PDF ﬁle always contains a node,

namely Document-catalog, and a page tree PT(T)

rooted at node Page-tree, denoted by root(pt); see,

Figure 3.

In light of algorithm Encode

SiP.to.RPG, we

next present an algorithm for embedding the water-

mark graph F[π

∗

] into a PDF document T. The main

idea behind the proposed embedding algorithm is a

systematic addition of appropriate object-references

in selected nodes of the page-tree PT(T) of the doc-

ument structure DS(T), through the use of entries of

type /Kye(·), so that the graph F[π

∗

] can be easily

constructed from the page-tree PT(T

) of the result-

ing watermarked document T

Let F[π

∗

] be a reducible permutation graph pro-

duced by one of our two encoding algorithms and

let u

n+1

,. .. , u

be the nodes of the graph

F[π

∗

]; note that, F[π

∗

] does not contain the back-edge

n+1

). In order to simplify the extraction process,

the graph F[π

∗

] which is embedded into a PDF doc-

ument T contains one extra back-edge, i.e., the edge

n+1

The algorithm for embedding a reducible permu-

tation graph F[π

∗

] into a PDF document T is called

Encode

RPG.to.PDF and is described below.

Algorithm Encode RPG.to.PDF

1. Compute the document structure DS(T) of the

input PDF document T and locate its page-tree

PT(T); let node(dc) be the document catalog

node of structure DS(T) and root(pt) be the root

node of the page tree PT(T);

2. Compute a path O(T) = (v

n+1

,. .. , v

) on

n+ 2 nodes (i.e., objects) of the page-tree PT(T)

s.t. v

n+1

= root(pt), and set s = v

n+1

and t = v

;

3. Assign an exact pairing (i.e., 1-1 correspondence)

of the n + 2 nodes of path O(T) to the nodes

n+1

,. .. , u

of the watermark graph F[π

∗

];

4. For each back-edge (u

) of the graph F[π

∗

]

(i.e., u

> u

), add the forward-edge (v

) in

page-tree PT(T) by adding in object [v

0 obj]

an entry of type /Key(v

0 R); add in object

n+1

0 obj] an entry of type /Key(v

0 R);

5. Return the modiﬁed PDF documentT which is the

watermarked document T

;

Let us brieﬂy discuss the way we add forward-edge

in the page-tree PT(T); recall that, in Step 4 of the

previous algorithm Encode

RPG.to.PDF we add the

forward-edge (v

) in page-tree PT(T) by adding in

object [v

0 obj] an entry of type /Key(v

0 R). The

entry /Key(v

0 R) may be of various types; note that,

/Key(·) is used as parameter in our algorithm’s de-

scription.

In our implementation, for the forward-edge

) such that the object [v

0 obj] is not the rood-

node root(pt) of the page-tree PT(T), we always

chose the entry /Key(v

0 R) which we add in object

0 obj] to be of the same type of object [v

0 obj].

In the case where v

= root(pt), we chose the entry

/Key(v

0 R) to be of type /Kids(·).

For example, in Figure 3 we have added forward-

edges from object [29 0 obj] to object [3 0 obj],

from object [29 0 obj] to object [24 0 obj], from

object [3 0 obj] to object [13 0 obj], etc. Thus,

in our implementation we have added in the root-

node object [29 0 obj] the entries /Kids(3 0 R)

and /Kids(24 0 R), in object [3 0 obj] the entry

/XObject(13 0 R), while in object [13 0 obj] the en-

tries /ColorSpace(6 0 R) and /R9(5 0 R).

Remark 3.1. Let T be a PDF ﬁle and let PT(T) be a

page-tree of the document structure DS(T). A node

of the page-tree PT(T) may contain several entries

/Key(·) of various types. We mention that some types

are required for entries in speciﬁc nodes of PT(T);

for example, the required entries in the root-node

root(pt) of the page-tree PT(T) are the /Type(·),

WatermarkingPDFDocumentsusingVariousRepresentationsofSelf-invertingPermutations

1 0 obj

catalog

Page

. . .

29 0 obj

Page

3 0 obj

Page

25 0 obj

Contents

23 0 obj

Resources

24 0 obj

XObject

13 0 obj

Resources

22 0 obj

. . .

ColorSpace

6 0 obj

ExtGState

8 0 obj

XObject

10 0 obj

Font

12 0 obj

5 0 obj

7 0 obj

R10

9 0 obj

R10

11 0 obj

. . .

Figure 3: The watermarked DS(T

) which encodes the RPG of π

∗

= (4, 5,3,1, 2).

/Parent(·), /Kids(·), and /Count(·).

Extraction. We next describe the corresponding ex-

traction algorithm which extracts the graph F[π

∗

]

from the PDF document T

watermarked by the algo-

rithm Encode RPG.to.PDF; the algorithm, which we

call Extract

RPG.from.PDF, works as follows:

• Take ﬁrst as input the PDF document T

, compute

its document structure DS(T

), and locate its page

tree PT(T

); then, ﬁnd in object root(pt), where

root(pt) is the root of the tree PT(T

), the entry

/Kids(v

0 R) s.t. v

is not a child of root(pt),

and set v

n+1

= root(pt) and v

= v

;

• Compute the path O(T) = (v

n+1

,. .. , v

)

of PT(T

), from node root(pt) to v

, and as-

sign an exact pairing (i.e., 1-1 correspondence)

of the n + 2 nodes of path O(T) to the nodes

n+1

,. .. , u

of a graph F[π

∗

]; initially,

E(F[π

∗

]) =

• Add edges (u

i+1

) in F[π

∗

] for i = n, n−1,. . ., 0,

and the edge (u

) iff (v

) is a forward edge

in the page tree PT(T

);

• Delete the edge (u

n+1

) from the graph F[π

∗

];

• Return the graph F[π

∗

];

It is easy to see that, by construction the returned

graph F[π

∗

] is a reducible permutation graph pro-

duced by the algorithm Encode

SiP.to.RPG (Chroni

and Nikolopoulos, 2012). Thus, F[π

∗

] has the follow-

ing property: the structure which results after deleting

(i) all the forward edges (u

i+1

) of F[π

∗

], 0 ≤ i ≤ n,

and

(ii) the node u

is either the tree T

[π

∗

] or the tree T

[π

∗

] pro-

duced during the execution of the decoding algorithm

Decode

RPG.to.SiP; see, (Chroni and Nikolopou-

los, 2012). Thus, we can efﬁciently extract the self-

inverting permutation π

∗

embedded into a PDF docu-

ment T by algorithm Encode

RPG.to.PDF.

4 DISCUSSION

In this section we discuss the performance of the

proposed watermarking algorithms after applying

them on various PDF documents. We imple-

mented the algorithms Encode

SiP.to.PDF-I, -II,

and Encode RPG.to.PDF and tested them on docu-

ments that have the same basic ﬁle structure (see, Sub-

section 2.1).

There are three main characteristics which we

usually take into account in order to describe and eval-

uate a digital watermarking system: ﬁdelity, robust-

ness, and capacity (Cox et al., 2008).

Fidelity refers to the perceptual similarity between

watermarked and original document. Concerning our

watermarking systems, it seems to be of high ﬁ-

delity as both algorithms Encode

SiP.to.PDF-II and

Encode RPG.to.PDF do not alter the PDF document

display, whereas the algorithm Encode

SiP.to.PDF-

I, although it modiﬁes directly the text of the PDF

WEBIST2015-11thInternationalConferenceonWebInformationSystemsandTechnologies

Table 1: Performance results of our graph-based algorithm Encode RPG.to.PDF with respect to other similar methods.

Algorithm Our RPG Algorithm of Other similar Algorithm based on

Performance Algorithm (Simin et al., 2011) algorithms PDF structure

Fidelity high high high high

Embedding based on inﬁnite 67% of text inﬁnite

Capacity ﬁle structure in theory size in theory

Robustness best better worse fragile

document, ensures that its modiﬁcation is not per-

ceived by the human visual system.

We mainly focused on our graph-based water-

marking algorithm Encode

RPG.to.PDF and evalu-

ated its robustness under general manipulations of

Adobe Acrobat Professional, such as addition of text,

comments, stamp, signature, as well as optimization.

The aforementioned operations although they added

new objects in the PDF document, they did not al-

tered the content of the original objects. The experi-

mental results show that proposed algorithm is robust

against editing and optimization attacks, since the en-

tries inserted in PDF’s objects does not affect their

functionality, and the watermark graph F[π

∗

] can be

successfully extracted from the PDF document.

The embedding capacity essentially depends on

the size of the watermark w or, in our case, of the size

of the embedding watermark RPG graph F[π

∗

]; note

that, the size of the watermark graph is the number

of vertices that it contains. In order to measure the

embedding capacity, we calculate the ratio |w|/|T|,

where |w| is the size of the watermark RPG graph

F[π

∗

] and |T| is the size of the original PDF docu-

ment T which is measured by counting the number

of objects it contains; we use the number of objects

since in our algorithm we assign an exact pairing of

the nodes of F[π

∗

] to the objects of T. We claim

that our algorithms have high embedding capacity for

large PDF documents since in such document we are

able to encode a watermark graph less than or equal

to document’s size. Recall that, our recently proposed

algorithms for encoding a number w as reducible per-

mutation graph F[π

∗

] encode a relatively large graph

into a large number of different integers (authors’ pa-

pers). Additionally, the embedding of the watermark

w didn’t increased the size of the PDF ﬁle.

We next select our graph-based encoding al-

gorithm Encode

RPG.to.PDF, compare it with

similar methods and evaluate its performance.

More precisely, we compare the algorithm

Encode

RPG.to.PDF with the algorithm recently

proposed by the authors of (Simin et al., 2011), since

both algorithms use similar watermarking technique,

i.e., both algorithms modify content on speciﬁc

objects of a PDF document.

In Table 1, we present the performance compar-

ison of our algorithm Encode RPG.to.PDF with re-

spect to the algorithm proposed by authors of (Simin

et al., 2011). In particular, in (Simin et al., 2011) au-

thors presented their results on the robustness of their

algorithm under speciﬁc attacks, such as addition of

text content, postil, stamp, signature, background and

delete text content of the PDF document. We ex-

tended these attacks by applying optimization to the

watermarked PDF document T

produced by our al-

gorithm and we show that the watermark w can be

successfully extracted by the document T

For the sake of completeness, in Table 1 we also

show performance results of the algorithm presented

in (Simin et al., 2011) with respect to other simi-

lar methods (Gu and Yang, 2009; Liu et al., 2006;

Kankanhalli and Hau, 2002; Wang and Liu, 2009), as

well as with respect to a method based on the structure

of PDF document (Zhong et al., 2007).

5 CONCLUDING REMARKS

In this paper we presented embedded algorithms,

along with their corresponding extraction algorithms,

for watermarking PDF documents using three differ-

ent representations of a self-inverting permutation π

∗

namely 1D-, 2D-, and RPG-representations.

In light of our graph-based embedding algorithm

Encode RPG.to.PDF it would be very interesting to

investigate the possibility of altering other compo-

nents of the document structure of a PDF ﬁle in order

to embed the graph F[π

∗

]; we leave it as a direction

for future work.

Moreover, an interesting open question is whether

the embedding approaches and techniques used in this

paper can help develop encoding algorithms having

“better” properties with respect to text attacks.

REFERENCES

Adobe (2006). Adobe systems incorporated, adobe

portable document format version 1.7. In Website

http://www.adobe.com.

Atallah, M., Raskin, V., Hempelmann, C., Karahan, M.,

Sion, R., Topkara, U., and Triezenberg, K. (2003).

WatermarkingPDFDocumentsusingVariousRepresentationsofSelf-invertingPermutations

Natural language watermarking and tamperprooﬁng.

In LNCS 2578, Springer, volume 5, pages 196–212.

Bindra, G. (2011). Invisible communication through

portable document ﬁle (pdf) format. In Proc. 7th Int’l

Conference on Intelligent Information Hiding and

Multimedia Signal Processing (IIH-MSP’11), pages

173–176.

Brassil, J., Low, S., Maxemchuk, N., and Gorman, L.

(1995). Hiding information in document images. In

Proc. of the 29th Annual Conference on Information

Sciences and Systems, pages 482–489.

Chroni, M., Fylakis, A., and Nikolopoulos, S. (2013). Wa-

termarking images in the frequency domain by ex-

ploiting self-inverting permutations. In Proc. 9th Int’l

Conference on Web Information Systems and Tech-

nologies (WEBIST’13), pages 45–54.

Chroni, M., Fylakis, A., and Nikolopoulos, S. (2014). From

image to audio watermarking using self-inverting per-

mutations. In Proc. 10th Int’l Conference on Web

Information Systems and Technologies (WEBIST’14),

pages 177–184.

Chroni, M. and Nikolopoulos, S. (2012). An efﬁcient graph

codec system for software watermarking. In Proc.

36th Int’l Conference on Computers, Software, and

Applications (COMPSAC’12), pages 595–600.

Collberg, C. and Nagra, J. (2010). Surreptitious Software.

Addison-Wesley.

Cox, I., Miller, M., Bloom, J., Fridrich, J., and Kalker,

T. (2008). Digital Watermarking and Steganography.

Morgan Kaufmann, 2nd edition.

Golumbic, M. (1980). Algorithmic Graph Theory and Per-

fect Graphs. Academic Press, Inc., New York.

Gu, Y. and Yang, Y. (2009). A text digital watermarking al-

gorithm for pdf document based on scrambling tech-

nique. In Journal of Foshan University (Natural Sci-

ence Edition), volume 2, pages 43–46.

Huang, D. and Yan, H. (2001). Interword distance changes

represented by sine waves for watermarking text im-

ages. In IEEE Trans. Circuits and Systems for Video

Technology, volume 11(12), pages 1237–1245.

Kankanhalli, M. and Hau, K. (2002). Watermarking of

electronic text documents. Electronic Commerce Re-

search, 2(1-2):169–187.

Lee, I. and Tsai, W. (2010). A new approach to covert com-

munication via pdf ﬁles. In Signal Processing, volume

90(2), pages 557–565.

Liu, H., Li, L., Li, J., and Huang, J. (2012). Three novel

algorithms for hiding data in pdf ﬁles based on incre-

mental updates. In Digital Forensics and Watermark-

ing, Springer Berlin Heidelberg, pages 167–180.

Liu, X., Zhang, Q., Tang, C., Zhao, J., and Liu, J. (2008). A

steganographic algorithm for hiding data in pdf ﬁles

based on equivalent transformation. In Int’l Sym-

posiums on Information Processing (ISIP’08), pages

417–421.

Liu, Y., Sun, X., and Luo, G. (2006). A novel information

hiding algorithm based on structure of pdf document.

In Computer Engineering, volume 32(17), pages 230–

232.

Low, S. and Maxemchuk, N. (2000). Capacity of text mark-

ing channel. In IEEE Signal Processing Letters, vol-

ume 7(12), pages 345–347.

Low, S., Maxemchuk, N., and Lapone, A. (1998). Docu-

ment identiﬁcation for copyright protection using cen-

troid detection. In IEEE Transactions on Communica-

tions, volume 46(3), pages 372–381.

Lu, P., Lu, Z., Zhou, Z., and Gu, J. (2008). An optimized

natural language watermarking algorithm based on

tmr. In Proc. 9th International Conference for Young

Computer Scientists, pages 1459–1463.

Maxemchuk, N. and Low, S. (1997). Marking text docu-

ments. In Proc. of the IEEE Int’l Conference on Image

Processing, pages 13–16.

Maxemchuk, N. and Low, S. (1998). Performance compar-

ison of two text marking methods. In IEEE Journal

of Selected Areas in Communications, volume 16(4),

pages 561–572.

Meral, H., Sankur, B.,

Ozsoy, A., G¨ung¨or, T., and Sevinc¸,

E. (2009). Natural language watermarking via mor-

phosyntactic alterations. In Computer Speech and

Language, volume 23(1), pages 107–125.

Simin, H., Xingming, S., and .Zhangjie, F. (2011). A novel

information hiding algorithm based on page object of

pdf document. In 10th IEEE Int’l Symposium on Dis-

tributed Computing and Applications to Business, En-

gineering and Science (DCABES’11), pages 266–270.

Sun, X. and Asiimwe, A. (2005). Noun-verb based tech-

nique of text watermarking using recursive decent se-

mantic net parsers. In LNCS 3612, volume Part III,

pages 968–971.

Topkara, M., Topraka, U., and Atallah, M. (2007). Infor-

mation hiding through errors: a confusing approach.

In Proc. of SPIE, Security, Steganography, and Wa-

termarking of Multimedia Contents IX, volume 6505,

pages 1–12.

Vybornova, O. and Macq, B. (2007). A method of text wa-

termarking using presuppositions. In Proc. of SPIE,

Security, Steganography, and Watermarking of Multi-

media Contents IX, volume 6505, pages 1–10.

Wang, Q. and Liu, X. (2009). A new watermarking algo-

rithm of pdf document based on correct coding. In

Computing Technology and Automation, volume 28,

pages 137–141.

Zhong, S., Cheng, X., and Chen, T. (2007). Data hiding in

a kind of pdf texts for secret communication. In In-

ternational Journal of Network Security, volume 4(1),

pages 17–26.

Zhou, X., Zhao, W., Wang, Z., and Pan, L. (2009). Security

theory and attack anlysis for text watermarking. In

Int’l Conference on E-Business and Information Sys-

tem Security (EBISS’09), pages 1–6.

WEBIST2015-11thInternationalConferenceonWebInformationSystemsandTechnologies