Pol
{
S1
{
Purpose:(current,contact [opt-in]),
Recipient:(ours), Retention:(indefinitely),
Data:(#user.login,#user.home-info)
}
S2
{
Purpose:(current,develop[opt-in],contact [opt-in]),
Recipient:(ours), Retention:(stated-purpose),
Data:(#user.name,#user.login,#user.home-info)
}}
Figure 1: A P3P Policy of Walmart.com.
is described in
Statement
which is the problematic
part inspiring this work. A policy can contain one or
more
Statement
elements where each
Statement
con-
sists of
Data-Group
,
Purpose
,
Recipient
and
Reten-
tion
. The
Data-Group
element contains a list of data
(
Data
element) which the services may collect and
optionally data categories (
Categories
element). P3P
specifies the categories for its defined standard set of
the
Data
elements. The data standard set is structured
in a hierarchy and grouped in four sets;
dynamic, user,
thirdparty
and
business
. Some
Data
elements can be
placed in more than one group. The elements
Pur-
pose
,
Recipient
and
Retention
describe, respectively,
for which purpose the data may be used, to whom the
data may be distributed, and for how long the data
will be kept. The
Purpose
and
Recipient
elements can
have multiple values while the
Retention
element can
have only one value. P3P specification defines twelve
values for
Purpose
, six values for
Recipient
and five
values for
Retention
.
Besides the above main elements, Web
sites/services can inform their users which data
element, which purpose of data usage, and which
data recipient are either optional or mandatory
through an optional attribute called
Optional
(
yes
or
no
) for the former and
Required
(
always
,
opt-out
or
opt-in
) for the latter two.
An example P3P policy of
walmart.com
, consist-
ing of two statements (
S1
and
S2
) is shown in Fig.1.
S1
collects user’s contact information and allows her
to create an account.
S2
collects other personal infor-
mation, viz. name, email, postal address for conduct-
ing surveys and contests.
Several issues on P3P policyambiguities were dis-
cussed in (Yu et al., 2004; Karjoth et al., 2003; Li
et al., 2003). Some of them were clarified and ad-
dressed in the latest version (v1.1) of P3P specifica-
tion. We analyzed and categorized causes of these
ambiguities into (i) syntax issue and (ii) pre-defined
vocabularies.
P3P Policy Syntax. P3P allows multiple statements
in a policy. This syntactic flexibility potentially
causes semantic conflicts. For instance, a data item
can be mentioned in different statements, assigning
different
Retention
values to it. As
Retention
val-
ues are mutually exclusive, it is not sensible to allow
such multiple values. This type of conflict is shown
in Fig.1 where the data
#user.login
and
#user.home-
info
, that should have only one
Retention
value, are
assigned to two
Retention
values i.e.
indefinitely
in
S1
and
stated-purpose
in
S2
. In addition, P3P defines
optional attributes expressing whether
Data
,
Purpose
and
Recipient
elements are required or optional. But,
ambiguities arise when, e.g.,
Data
element is required
while
Purpose
and
Recipient
elements are optional. It
is unclear whether or not the data is collected in the
first place.
Pre-defined Vocabularies. With the pre-defined
values of
Purpose, Retention, Recipient
and
Data Cat-
egory
elements, some combination of values between
them are inconsistent. Consider, e.g. a statement con-
taining
Purpose
value
develop
meaning “information
may be used to enhance, evaluate,or otherwise review
the site, service, product, or market”; and
Retention
value
no-retention
meaning “information is not re-
tained for more than a brief period of time necessary
to make use of it during the course of a single on-
line interaction”. This introduces a conflict since the
data collected under purpose
develop
are required to
be stored for longer than permitted time
no-retention
.
3 DATA–PURPOSE CENTRIC
SEMANTICS FOR P3P
In order to establish an Ontology, the relationships
between entities in the domain must be known. In
P3P policies, it is certain that the
Data
element is a
main entity. The work from TingYu etal. (Yu et al.,
2004) proposed a formal semantics for P3P employ-
ing a data-centric view. However, the purpose of data
usage is also an important information for data prac-
tices, i.e. there must be a reason to collect the data.
In addition, how long the data should be retained de-
pends on the purpose of collection. Moreover, this
way of interpretation also complies with the Purpose
Specification Principle of OECD and the EU Direc-
tive 95/46/EC Article 10(b) that requires the data con-
troller (website) to inform the data subject (user) at
least about the identity of the controller and the pur-
poses of the data collection. We, therefore, propose
to use both the data and purpose as the keys in our
formal semantics for P3P.
Besides the inherent constraints according to P3P
specification, we define additional constraints for
checking potential semantic conflicts described in
previous section as follows:
Multiple Statements. The elements that should
have only one value are
Retention
element; and
Op-
KEOD 2011 - International Conference on Knowledge Engineering and Ontology Development
350