A DISTORTION FREE WATERMARK FRAMEWORK
FOR RELATIONAL DATABASES
Sukriti Bhattacharya and Agostino Cortesi
Dipartimento di Informatica, Universita Ca’ Foscari di Venezia, Via Torino 155, 30170 Venezia, Italy
Keywords:
Database watermarking, HMAC, Abstract interpretation.
Abstract:
In this paper we introduce a distortion free invisible watermarking technique for relational databases. The
main idea is to build the watermark after partitioning tuples with actual attribute values. Then, we build hash
functions on top of this grouping and get a watermark as a permutation of tuples in the original table. As the
ordering of tuples does not affect the original database, this technique is distortion free. Our contribution can
be seen as an application to relational databases of software watermarking ideas developed within the Abstract
Interpretation framework.
1 INTRODUCTION
A watermark can be considered to be some kind of
information that is embedded into underlying data
for tamper detection, localization, ownership proof,
and/or traitor tracing purposes (Agrawal et al., 2003).
Watermarking techniques apply to various types of
host content. Here, we concentrate on relational
databases. Rights protection for such data is crucial in
scenarios where data are sensitive, valuable and nev-
ertheless they need to be outsourced. A good exam-
ple is data mining application, where data are sold in
pieces to parties specialized in mining it, e.g. sales
patterns database, oil drilling data, financial data.
Other scenarios involvefor example online B2B inter-
actions, e.g., airline reservation and scheduling por-
tals, in which data are made available for direct, in-
teractive use. Given the nature of most of the data,
it is hard to associate rights of the originator over it
(Lafaye, 2007). Watermarking can be used to solve
these issues. Unlike encryption and hash description,
typical watermarking techniques modify ordinal data
as a modulation of the watermark information and
inevitably cause permanent distortion to the original
data and therefore cant not meet the integrity require-
ment of the data as required in some applications.
The first well-known database watermarking
scheme for relational databases was proposed by
Agrawal and Kiernan (Agrawalet al., 2003) for water-
marking numerical values. The fundamental assump-
tion is that the watermarked database can tolerate a
small amount of errors. Since any bit change to a
categorical value may render the value meaningless,
Agrawal and Kiernan’s scheme cannot be directly ap-
plied to watermarking categorical data. To solve this
problem, Sion (Sion, 2004) proposed to watermark a
categorical attribute by changing some of its values to
other values of the attribute (e.g., red’ is changed to
’green’) if such change is tolerable in certain applica-
tions.
All of the work cited so far, assume that minor dis-
tortions caused to some attribute data can be tolerated
to some specified precision grade. However some ap-
plications in which relational data are involved can-
not tolerate any permanent distortions and data’s in-
tegrity needs to be authenticated. To meet this re-
quirement, we further strengthen this approach and
propose a distortion free watermarking algorithm for
relational databases based on reordering tuples. The
robustness of the proposed watermarking obviously
depends on the size of the individual groups so it is
specifically designed for large databases. The result-
ing watermark is robust against various forms of ma-
licious attacks and updates to the data in the table.
Database watermarking consists of two basic pro-
cesses: watermark insertion and watermark detection
(Agrawal et al., 2003), as illustrated in Figure 1. For
watermark insertion, a key is used to embed water-
mark information into an original database so as to
produce the watermarked database for publication or
distribution. Given appropriate key and watermark
information, a watermark detection process can be
applied to any suspicious database so as to deter-
mine whether or not a legitimate watermark can be
229
Bhattacharya S. and Cortesi A. (2009).
A DISTORTION FREE WATERMARK FRAMEWORK FOR RELATIONAL DATABASES.
In Proceedings of the 4th International Conference on Software and Data Technologies, pages 229-234
DOI: 10.5220/0002256402290234
Copyright
c
SciTePress
detected. A suspicious database can be any water-
marked database or innocent database, or a mixture
of them under various database attacks.
Figure 1: Basic watermarking process.
While the basic processes in database watermark-
ing are quite similar to those in watermarking multi-
media data, the approaches developed for multimedia
watermarking cannot be directly applied to databases
because of the difference in data properties. These
differences include (Agrawal et al., 2003):
A multimedia object consists of a large number
of bits with considerable redundancy. Thus, the
watermark has a large cover in which to hide. A
database relation consists of tuples, each of which
represents a separate object. The watermark needs
to be spread over these separate objects.;
The relative spatial/temporal positioning of vari-
ous pieces of a multimedia object typically does
not change. Tuples of a relation, on the other
hand, constitute a set, and there is no implied or-
dering between them.
Multimedia objects typically remain intact; por-
tions of an object cannot be dropped or replaced
arbitrarily without causing perceptual changes
in the object. On the other hand, tuple inser-
tions, deletions, and updates are the norm in the
database setting.
Once outsourced, i.e., out of the control of the wa-
termarker, data might be subjected to a set of attacks
or transformations; these may be malicious e.g., with
the explicit intent of removing the watermark - or sim-
ply the result of normal use of the data. An effective
watermarking technique must be able to survive such
use. The main idea of this paper is to build the wa-
termark after partitioning tuples which depend on the
actual attribute values and then to build hash func-
tions on top of these groupings. Our contribution can
be seen as an application to relational databases of
software watermarking ideas developedwithin the ab-
stract interpretation frame work (Cousot and Cousot,
2004) and (Cousot and Cousot, 2007).
The paper is organized as follows. In section
2 we formalize the definition of tables in relational
database and hence a formal definition of watermak-
ing process of a table in relational database is given.
Section 3 illustrates how a table in relational database
can be abstracted. Then a distortion free algorithm
for watermark embedding and verification, on top of
this abstraction, is provided in section 4 and 5, re-
spectively. Section 6 illustrates the algorithms by a
suitable example. The correctness proof of this wa-
termarking process is given in section 7. A robust-
ness analysis of this watermarking scheme is given in
section 8. Finally we draw our conclusions in section
9.
2 PRELIMINARIES
This section contains formal definitions (Collberg
and Thomborson, 2002) and (Haan and Koppelaars,
2007) of tables in relational database and database
watermarking.
Definition 2.1: Function.
Let Π
i
be the projection function which selects the i-
th coordinate of a pair. F is a function over the set
A into set B F (A× B)
V
(p
1
, p
2
F : p
1
6=
p
2
Π
1
(p
1
) 6= Π
1
(p
2
))
V
{Π
1
(p)|p F} = A.
Definition 2.2: Set Function.
A set function is a function in which every range ele-
ment is a set. Formally, let F is a set function F is
a function and (c dom(F) : F(c) is a set).
For instance we can express information about com-
panies and their locations by means of a set function
over the domain {Company, Location}, namely.
(Company;{Natural Join’, ’Central Boekhuis’, ’Or-
acle’, ’Remmen & De Brock}) (Location, {’New
York’, ’Venice’, ’Paris’})
Definition 2.3: Table.
Given two sets H and K, a table over H and K is a set
of functions T over the same set H and into the same
set K. i.e. t T: t is a function from H to K.
For instance consider a table containing data on em-
ployees:
The table is represented by the set of functionst
1
,t
2
,t
3
where dom(t
i
) = emp no, emp name, emp rank and
ICSOFT 2009 - 4th International Conference on Software and Data Technologies
230
Table 1: EMPLOYEE.
emp no emp name emp rank
100 John Manager
101 David programmer
103 Albert HR
for instance t
1
(emp name) = John.
There is a correspondence between tuples and
functions. For instance, t
1
corresponds to the fol-
lowing tuple: (emp no, 100), (emp name, John),
(emp rank, manager). The first coordinates of the
ordered pairs in a tuple are referred to as the attributes
of that tuple.
Definition 2.4: Watermarking.
A watermark W for a table T over H into K, is a pred-
icate such that W(T) is true and the probability of
W(T
) being true with T
(H × K)\T is negligi-
ble.
3 DATABASE ABSTRACTION
Our watermarking technique can be seen as an exten-
sion to relational databases of general ideas as soft-
ware watermarking as introduced by P. Cousot and
R.Cousot (Cousot and Cousot, 2004).
Given a relational database (i.e. a set of tables),
we assume that the semantics of a query is precisely
defined, and its answer is unique. Let A be the set of
attributes in T. The domain D of a categorical attribute
x A in table T is the set of all possible values of x,
and the (finite) value set V ( D) is the set of values
of x that are actually present in T. We can partition the
tuples in T by grouping the values of attribute x by P
= [[v
i
] : 1 i N, where t T : t.x = v
i
t [v
i
].
The frequency q
i
of v
i
is the number of tuples in [v
i
].
The data distribution of x (in T) is the set of pairs
τ = {(v
i
, f
i
)|1 i N}. So the entire database can be
partitioned into N fixed mutual exclusive areas based
on each categorical value v
i
. The above concept leads
to an abstraction as depicted in Figure 2.
Given a table T over A into D and a categorical
attribute x A and P = {[v
i
] : 1 i N}, for each
set S T, We can define a concretization map γ
x
as
follows:
(
γ
x
(v
i
,h) = S T | t S :t [v
i
] size of S is h
γ
x
() = T
γ
x
() =
/
0
The best representation of a set of tuples with at-
tribute x is captured by the corresponding abstraction
function α
x
:
Figure 2: Table Abstraction (Galois Connection).
α
x
(S) =
(
(v
i
,h) if t S : t.x = v
i
size of S is h
if t
1
,t
2
S : t
1
.x 6= t
2
.x
if S =
/
0
We may prove that (α
x
,γ
x
) form a Galois insertion
(Myrvold and Ruskey, 2001) with α
x
monotoneand γ
x
weakly monotone, i.e. (v, u) (v, m) (γ(v,u))
(γ(v,u)).
4 WATERMARK EMBEDDING
The watermark embedding process is partition based
on a partition P = {[v
i
] : 1 i N} corresponding
to a given categorical value x. This partitioning can
be seen as a virtual grouping which does not change
the physical position of the tuples. After grouping,
all tuples in each group are sorted according to their
primary key . Like grouping, the sorting operation
does not change the physical position of tuples ei-
ther. Each group is then considered (hence, water
marked)independently.
Recall that we denote by q
k
the frequency of
[v
k
], i.e the number of tuples in [v
k
]. A (keyed)
group hash value (H
qk
) is computed using a HMAC
(HMAC, 2002) function based on the tuple hash val-
ues in a sorted order. Then, a watermark W =
extractBits(H
qk
,ln(q
k
)) of length ln(q
k
) is derived
from H
qk
. The watermark W is embedded into this
group by permuting the order of the tuples. The new
order π can be easily calculated from W using Myr-
vold and Ruskey’s linear permutation unranking algo-
rithm (Myrvold and Ruskey, 2001) based on W:
A DISTORTION FREE WATERMARK FRAMEWORK FOR RELATIONAL DATABASES
231
Algorithm 1: Watermark Embedding in [v
k
].
Begin
for i = 1 to q
k
do // number of tuples in group
G
k
h
i
= HMAC(, r
i
) // tuple hash of tuple r
i
end for
H
1
= HMAC(, h
1
)
for i = 2 to q
k
do
H
i
= HMAC(, H
(i1)
* p
h
i
i
)
//where p
i
is the i-th prime number. e.g. p
2
=2,
p
5
=7 and so on
end for
W = ExtractBits(H
qk
,ln(q
k
))
Unrank(q
k
,W,π
k
)
End
Procedure ExtractBits(H
qk
, l)
l = Concatenation of first l selected bits from
H
qk
// assuming H
qk
is longer than l
return l
End
Procedure Unrank(q
k
, W, π
k
)
π
k
= (0, ... , q
k1
)
if (q
k
> 0) then
swap(π[q
k
-1], π
k
[W mod q
k
])
Unrank(q
k
1, W/q
k
, π
k
)
end if
End
5 WATERMARK VERIFICATION
A very important problem in a watermarking scheme
is synchronization, that is, we must ensure that the
watermark extracted is in the same order as that em-
bedded. If synchronization is lost, even if no mod-
ifications have been made, the embedded watermark
cannot be correctly verified. In watermark detection,
a group of q
k
tuples are selected and their ordering π
k
is identified, where π
k
is a permutation of (0, . . . ,
q
k
1). A watermark W’ can be derived from π
k
us-
ing Myrvold and Ruskey’s linear permutation ranking
algorithm (Myrvold and Ruskey, 2001). Based on the
tuple hash of the sorted tuples, a group hash value is
computed (H
qk
) using a HMACfuction with same key
value used during embedding. Then, a watermark
W is extracted from the group hash. If W matches
W’, the tuples in this group are authentic; otherwise,
the data in this group have been modified or tampered
with.
Algorithm 2: Watermark verification in [v
k
].
Begin
for i = 1 to q
k
do // number of tuples in group
G
k
h
i
= HMAC(, r
i
) // tuple hash of tuple r
i
end for
H
1
= HMAC(, h
1
)
for i = 2 to q
k
do
H
i
= HMAC(, H
(i1)
* p
h
i
i
)
// where p
i
is the i-th prime number. e.g. p
2
=2,
p
5
=7 and so on
end for
W = ExtractBits(H
qk
,ln(q
k
))
W’=rank(q
k
, π
k
, π
1
k
)
if (W = W’) then
return (Authentication Preserved)
else
return (Authentication Violated)
End
Procedure rank(q
k
,π
k
,π
1
k
)
if(q
k
= 1) then return 0
s = π
k
[q
k
1]
swap(π
k
[q
k
1],π
k
[π
1
k
[q
k
1]])
swap(π
1
k
[s],π
1
k
[q
k
1])
return(s+q
k
* rank(q
k
1, π
k
, π
1
k
))
End
6 EXAMPLE
This section illustates an example of applying this
watermarking technique on the table PARTS =
(prts no,color) 2 which only has 14 tuples with pri-
mary key prts no. TABLE 2 is the original table be-
fore partitioning operation.
Table 3 is the partitioned table based on the color
attribute with hash values h
i
, associated with each tu-
ple. The table contains three partitions [v
red
],[v
blue
]
and [v
black
]. Frequencies of each group are q = 8, q =
4 and q = 2, respectively.
For simplicity, we only show hypothetical hash
values, not the real ones, for each tuple. Group
hash values H
blue
,H
red
and H
black
are calculated us-
ing tuple hash values h
i
, 1 i 14 associated with
corresponding tuples and a secured key . Let us
suppose that W
blue
= ExtractBits (H
blue
, ln (q
blue
)) =
(10)
2
,W
red
= ExtractBits(H
red
, ln (q
red
)) = (011)
2
and
W
black
= ExtractBits(H
black
,ln (q
black
)) = (0)
2
, respec-
tively. Table 4 shows the watermark embedding oper-
ation on Table 2 using unrank function as illustrated in
Algorithm 1 for each partition, respectively. The final
watermark value associated with the initial PARTS ta-
ble (Table 2) is equal to W = W
blue
W
red
W
black
=
ICSOFT 2009 - 4th International Conference on Software and Data Technologies
232
Table 2: PARTS (before partitioning).
prts no color
1007P BLUE
1017P RED
1027P RED
1030P BLACK
1032P RED
1040P RED
1044P BLUE
1049P RED
1051P BLUE
1055P RED
1057P BLACK
1067P RED
1077P BLUE
1087P RED
Table 3: PARTS (after partitioning).
prts no color h
i
1007P BLUE 1900
1044P BLUE 1790
1051P BLUE 1990
1077P BLUE 1654
1017P RED 2000
1027P RED 1799
1032P RED 1929
1040P RED 1897
1049P RED 1700
1055P RED 1690
1067P RED 1770
1087P RED 2790
1030P BLACK 1999
1057P BLACK 1970
(10)
2
(011)
2
(0)
2
= (100110)
2
.
Observe that any change to the initial table can
clash against the watermark W, which is embedded
into the final table. Applying Algorithm 2 to Table
4 we may easily check that the authentication is pre-
served.
7 CORRECTNESS
In this section we show the correctness proof of the
algorithm discussed so far.
Theorem. Let T be a table over H into K, and let [v
i
] :
1 i N be a partition of T. Let π
k
be the permutation
computed by Algorithm 1 and let π = π
1
π
2
... π
N
be the compositions of those permutations. Then the
Table 4: PARTS (Watermarked).
prts no color
1044P BLUE
1077P BLUE
1007P BLUE
1051P BLUE
1027P RED
1032P RED
1087P RED
1049P RED
1055P RED
1067P RED
1017P RED
1040P RED
1057P BLACK
1030P BLACK
ordered set hW
1
,W
2
,...,W
N
i is a watermark for T, that
can be embedded in the table itself as π(T).
Proof. Let T be a table containing η tuples and let
P = [v
i
] : 1 i N be a partition with frequencies
q
i
|1 i N, where each q
i
η. The group hash
value H
qk
is computed using tuple hash values h
i
, for
1 i q
k
in the set [v
k
]. All the hash values are
computed using a HMAC (HMAC, 2002) function by
a secret key known by the database owner. Using
H
qk
, the watermark W
k
of length ln(q
k
) is calculated
and embedded in [v
k
] by Algorithm 1. So, the value
of W
k
depends only on the and on tuples in [v
k
].
Suppose the tuple t
m
[v
k
] is tampered some-
how. So the hash value h
m
associated to t
m
will
correspond to a different group hash value H
qk
by ideal hash function properties. Since an ideal
HMAC is injective, when using the secure key
and two different messages M
1
and M
2
, we would
get HMAC(,M
1
)6=HMAC(,M
1
). Therefore, cor-
responding to the tampered t
m
we would get a dif-
ferent watermark W
k
. When the frequencies q
k
are
big enough to guarantee that the probability of guess-
ing the permutation π
k
is negligible, we get that
hW
1
,W
2
,...,W
N
i is a watermark according to the Def-
inition 2.4.
8 ROBUSTNESS
We analyze now the probability that all alterations
are correctly localized in corresponding groups over
a watermarked table. We consider three kind of alter-
ations:
Modify an attribute value;
A DISTORTION FREE WATERMARK FRAMEWORK FOR RELATIONAL DATABASES
233
Insert a tuple;
Delete a tuple.
We assume that partition [v
k
] consists of q
k
tuples;
thus, the length of the embedded watermark W
k
for
[v
k
] is ln(q
k
).
Suppose a single attribute value is modified in the
watermarked relation. Without loss of generality, as-
sume r
i
.A
j
is modified (i.e., the j-th attribute of the
i-th tuple) in [v
k
]. The modification will affect the tu-
ple hash h
i
, the group hash H
q
k
and thus the embed-
ded watermark W
k
. After the modification, each bit of
W
k
has equal probabilities to match the corresponding
bit of W
k
, which is the watermark extracted from the
group. Therefore, the probability that the modifica-
tion can be correctly detected (i.e., W
k
= W
k
) is
Prob
alt
= 1
1
2
ln(q
k
)
(1)
Let a single tuple be inserted into the watermarked
relation. The inserted tuple will be allocated to one of
the N partitions [v
k
] in the watermark detection. Since
the added tuple will affect the group hash as well as
the embedded watermark in a random way, the proba-
bility that the insertion can be detected (i.e., the water-
mark verification result is false for the affected group)
is
Prob
ins
= 1
1
2
ln(q
k
+1)
(2)
Let a single tuple be deleted from the water-
marked relation. Exactly one partition [v
k
] will lose
the deleted tuple in watermark detection. The absence
of the deleted tuple in the group will affect the group
hash as well as the embedded watermark in a random
way. The probability that the deletion can be detected
(i.e., the watermark verification result is false for the
affected group) is
Prob
del
= 1
1
2
ln(q
k
1)
(3)
So in general we conclude that the probability that
any alteration on the table is eventually located by the
algorithm is O(1
1
2
ln(N)
), where N is the size (num-
ber of tuples) of the table. This ensures that the water-
marking techinique satisfies the requirement of Defi-
nition 2.4.
9 CONCLUSIONS
The proposed watermarking scheme is group based
and a group watermark is associated to a specific or-
der of tuples in that group. The length of the water-
markW
k
for k-th group is ln(the number of tuples in k-
th group).The longer a watermark, the more secure is
the scheme. And the final watermark associated with
the table will be the concatenated watermark strings
associated with each group. The fact that this algo-
rithm is group based, yields to the following strength
points:
We are able to detect and locate modifications as
we can trace the group which is possibly affected
when a tuple t
m
is tampered;
It is distortion free and invisible as it consists into
a permutation of the original table tuples. There-
fore no space overhead is required.
Observe that, this watermarking technique can be
tuned according to different security levels, by re-
turning the partitioning through the use of multiple
attributes instead of a single one.
ACKNOWLEDGEMENTS
Work partially supported by Italian MIUR CONIF ’07
project ”SOFT”.
REFERENCES
Agrawal, Kiernan, and Haas (2003). Watermarking rela-
tional data: framework, algorithms amd analysis. The
VLDB Journal.
Collberg and Thomborson (2002). Watermarking, tamper-
proofing, and obfuscation - tools for software protec-
tion. IEEE Trans. Software Engrg.
Cousot, P. and Cousot, R. (2004). An abstract
interpretation-based framework for software water-
marking. 31st ACM SIGPLANSIGACT symposium
on Principles of programming languages.
Cousot, P. and Cousot, R. (2007). Abstract interpretation
and application to static analysis . 1st IEEE and IFIP
International Symposium on Theoretical Aspects of
Software Engineering.
Haan and Koppelaars (2007). Applied Mathematics for
database Professionals. Apress.
HMAC (2002). The keyed-hash message authentication
code. FEDERAL INFORMATION PROCESS STAN-
DARDS PUBLICATION.
Lafaye, J. (2007). An analysis of database watermarking
security. 3rd International Symposium on Information
Assurance and Security.
Myrvold and Ruskey (2001). Ranking and unranking per-
mutations in linear time. Inf. Process. Lett.
Sion (2004). Proving ownership over categorical data. IEEE
International Conference on Data Engineering.
ICSOFT 2009 - 4th International Conference on Software and Data Technologies
234