AN ONTOLOGICAL APPROACH TO VERIFYING P3P POLICIES

∗

Assadarat Khurat

and Boontawee Suntisrivaraporn

Institute for Security in Distributed Applications, Hamburg University of Technology, Hamburg, Germany

School of Information and Computer Technology, SIIT, Thammasat University, PathumThani, Thailand

Keywords:

P3P policy, Ontology, Semantic web.

Abstract:

Privacy has become a crucial issue in the online services realm. P3P policy is a privacy policy enabling

websites to express their privacy practices. With this policy, online users can check against their privacy

preferences which facilitates the users to decide whether or not the service should be used. However, the inter-

pretation of a P3P policy is unwieldy due to the lack of a precise semantics of its descriptions and constraints.

For instance, it is admissible to have purpose and recipient values that have inconsistent meaning. Thus, there

is a need for an explicit formal semantics for P3P policy to mitigate this problem. In this paper, we propose

to use an OWL ontology to systematically and precisely describe the structures and constraints inherent in the

P3P speciﬁcation. Additional constraints are also deﬁned and incorporated into the ontology in such a way

that the reasons of an invalid P3P policy can be disclosed after the veriﬁcation done by an OWL reasoner.

1 INTRODUCTION

Privacy has become an important issue for the on-

line world. To provide a service, online service

providers may collect and store users’ sensitive data

where misuses of these data cause privacy breaches.

Many countries and organizations, thus, have con-

cerned with privacy issue seen from enactment of pri-

vacy laws—e.g. Privacy Acts in the USA, EU Direc-

tives in European Community and OECD Guidelines

for international level.

The Platform for Privacy Preferences (P3P) Pol-

icy (Cranor et al., 2002), standardized by W3C, is

a technology that stems from this privacy concern.

It can be used by websites to express their practices

about customers’ data in the machine-readable for-

mat, XML. A P3P user agent embedded in e.g. a

web browser can compare P3P policies of service

providers with the users’ privacy preferences speci-

ﬁed beforehand. The comparison result enables the

users to decide whether to use the services or not.

However P3P policies may contain internal semantic

inconsistencies. Thus, to detect existing discrepancies

and regain consistency, the formal semantics for P3P

is compulsory and it needs to be explicitly formalized.

The Web Ontology Language (OWL) (Bechhofer

∗

This work is partially supported by the National Re-

search University Project of Thailand Ofﬁce for Higher Ed-

ucation Commission and by Thailand Research Fund.

et al., 2004), a W3C recommendation, is a well-

known semantic web technology. Due to its capa-

bility in expressing logical formalism (Description

Logic); and both structures of P3P policy documents

and dependencies that can be described as an ontol-

ogy, we decide to use OWL ontology to provide for-

mal semantics for P3P. The beneﬁts of employing

OWL for P3P are twofold: (i) the logical underpin-

ning of OWL guarantees preciseness of the deﬁnitions

and constraints, i.e. ambiguity is reduced; and (ii) an

OWL reasoning tool can be exploited to automatically

check consistency of a particular P3P policy. Our pro-

posed frameworkis based on the data–purpose centric

interpretation. We also aim to be able to detect incon-

sistencies in a P3P policy, and to explain which part

is the culprit.

2 P3P & ITS POTENTIAL

INCONSISTENCIES

In P3P policy, not only how websites treat the col-

lected data is expressed, but other aspects concerning

privacy practices can be also described. These aspects

are

Entity

, the policy issuer;

Access

, the ability of

individuals to access their data; and

Dispute-Group

resolution procedures when disputes between privacy

policies occur.

How the websites may deal with the collected data

349

Khurat A. and Suntisrivaraporn B..

AN ONTOLOGICAL APPROACH TO VERIFYING P3P POLICIES.

DOI: 10.5220/0003628203490353

In Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD-2011), pages 349-353

ISBN: 978-989-8425-80-5

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

Pol

{

Purpose:(current,contact [opt-in]),

Recipient:(ours), Retention:(indeﬁnitely),

Data:(#user.login,#user.home-info)

}

{

Purpose:(current,develop[opt-in],contact [opt-in]),

Recipient:(ours), Retention:(stated-purpose),

Data:(#user.name,#user.login,#user.home-info)

}}

Figure 1: A P3P Policy of Walmart.com.

is described in

Statement

which is the problematic

part inspiring this work. A policy can contain one or

Statement

elements where each

Statement

con-

sists of

Data-Group

Purpose

Recipient

and

Reten-

tion

. The

Data-Group

element contains a list of data

(

Data

element) which the services may collect and

optionally data categories (

Categories

element). P3P

speciﬁes the categories for its deﬁned standard set of

the

Data

elements. The data standard set is structured

in a hierarchy and grouped in four sets;

dynamic, user,

thirdparty

and

business

. Some

Data

elements can be

placed in more than one group. The elements

Pur-

pose

Recipient

and

Retention

describe, respectively,

for which purpose the data may be used, to whom the

data may be distributed, and for how long the data

will be kept. The

Purpose

and

Recipient

elements can

have multiple values while the

Retention

element can

have only one value. P3P speciﬁcation deﬁnes twelve

values for

Purpose

, six values for

Recipient

and ﬁve

values for

Retention

Besides the above main elements, Web

sites/services can inform their users which data

element, which purpose of data usage, and which

data recipient are either optional or mandatory

through an optional attribute called

Optional

(

yes

) for the former and

Required

(

always

opt-out

opt-in

) for the latter two.

An example P3P policy of

walmart.com

, consist-

ing of two statements (

and

) is shown in Fig.1.

collects user’s contact information and allows her

to create an account.

collects other personal infor-

mation, viz. name, email, postal address for conduct-

ing surveys and contests.

Several issues on P3P policyambiguities were dis-

cussed in (Yu et al., 2004; Karjoth et al., 2003; Li

et al., 2003). Some of them were clariﬁed and ad-

dressed in the latest version (v1.1) of P3P speciﬁca-

tion. We analyzed and categorized causes of these

ambiguities into (i) syntax issue and (ii) pre-deﬁned

vocabularies.

P3P Policy Syntax. P3P allows multiple statements

in a policy. This syntactic ﬂexibility potentially

causes semantic conﬂicts. For instance, a data item

can be mentioned in different statements, assigning

different

Retention

values to it. As

Retention

val-

ues are mutually exclusive, it is not sensible to allow

such multiple values. This type of conﬂict is shown

in Fig.1 where the data

#user.login

and

#user.home-

info

, that should have only one

Retention

value, are

assigned to two

Retention

values i.e.

indeﬁnitely

and

stated-purpose

. In addition, P3P deﬁnes

optional attributes expressing whether

Data

Purpose

and

Recipient

elements are required or optional. But,

ambiguities arise when, e.g.,

Data

element is required

while

Purpose

and

Recipient

elements are optional. It

is unclear whether or not the data is collected in the

ﬁrst place.

Pre-deﬁned Vocabularies. With the pre-deﬁned

values of

Purpose, Retention, Recipient

and

Data Cat-

egory

elements, some combination of values between

them are inconsistent. Consider, e.g. a statement con-

taining

Purpose

value

develop

meaning “information

may be used to enhance, evaluate,or otherwise review

the site, service, product, or market”; and

Retention

value

no-retention

meaning “information is not re-

tained for more than a brief period of time necessary

to make use of it during the course of a single on-

line interaction”. This introduces a conﬂict since the

data collected under purpose

develop

are required to

be stored for longer than permitted time

no-retention

3 DATA–PURPOSE CENTRIC

SEMANTICS FOR P3P

In order to establish an Ontology, the relationships

between entities in the domain must be known. In

P3P policies, it is certain that the

Data

element is a

main entity. The work from TingYu etal. (Yu et al.,

2004) proposed a formal semantics for P3P employ-

ing a data-centric view. However, the purpose of data

usage is also an important information for data prac-

tices, i.e. there must be a reason to collect the data.

In addition, how long the data should be retained de-

pends on the purpose of collection. Moreover, this

way of interpretation also complies with the Purpose

Speciﬁcation Principle of OECD and the EU Direc-

tive 95/46/EC Article 10(b) that requires the data con-

troller (website) to inform the data subject (user) at

least about the identity of the controller and the pur-

poses of the data collection. We, therefore, propose

to use both the data and purpose as the keys in our

formal semantics for P3P.

Besides the inherent constraints according to P3P

speciﬁcation, we deﬁne additional constraints for

checking potential semantic conﬂicts described in

previous section as follows:

Multiple Statements. The elements that should

have only one value are

Retention

element; and

Op-

KEOD 2011 - International Conference on Knowledge Engineering and Ontology Development

350

tional

and

Required

attributes. Under the data–

purpose based interpretation, we deﬁne that in a pol-

icy there must be only one value of

Retention

and

Re-

quired

for each data–purpose pair, otherwise the pol-

icy is considered invalid. The constraint for

Optional

attribute is deﬁned analogously but only for each data,

since this attribute only belongs to the

Data

element.

Data Hierarchy. Considering data standard set’s

hierarchy, it does not make sense if the data has

more restrictions on its collection than its descen-

dants. Therefore, we deﬁne that in a policy contain-

ing data where one (e.g.

#user.bdate.ymd.year

)isa de-

scendant of the other (e.g.

#user.bdate

), the

Optional

value of the descendant must be equal or more restric-

tive than the other one; where we deﬁne that the value

is more restrictive than

yes

. The same condition

also applies to the

Required

values of

Purpose

and

Recipient

elements for their constraints, where we de-

ﬁne that the value

always

is more restrictive than

opt-

out

, and

opt-out

is more restrictive than

opt-in

Optional Attributes. Due to unclear meanings of

optional attributes (

Optional

and

Required

) in the

P3P speciﬁcation, we deﬁne that, for each data, if all

of its purposes are optional (

Required

value of

Pur-

pose

element is

opt-in

), its collection must be op-

tional (

Optional

value is

yes

). This is because, for

opt-in

, the services may use the data only when the

users speciﬁcally request to. Thus, before this request

is made, the services should not collect the data.

Inconsistent Meaning between Purpose, Recipient,

Retention and Data Category Values. Except the

pair between

Data Category

and

Retention

, we de-

ﬁne eight constraints to check semantic consistency

of each pair between

Purpose

Recipient

Retention

and

Data Category

. Four constraints are deﬁned for

the pair

Purpose

and

Data Category

according to the

User Agent Guidelines (Cranor, 2003) which has been

appended to P3P1.1 speciﬁcation. For the rest pairs

i.e. between

Purpose

and

Recipient

;

Purpose

and

Re-

tention

;

Retention

and

Recipient

; and

Recipient

and

Data Category

, one constraint is deﬁned for each. Due

to space limitation we give an example of a constraint

between

Purpose

and

Retention

as below:

In a policy, when

Purpose

value is one of

ad-

min

historical

develop

pseudo-analysis

pseudo-

decision

individual-analysis

individual-decision

telemarketing

and

contact

, its associated

Reten-

tion

value must not be

no-retention

4 AN ONTOLOGY FOR P3P

We propose to use an OWL ontology to systemati-

cally and precisely describe the structures and con-

collects

(some)

collectedForPurpose

(some)

Policy

Collected

Data

DataPurpose-

CollectionPractice

Data

Purpose

Recipient Retention

hasRecipient

hasRetention

Figure 2: Data–purpose centric model with the other ele-

ments grouped together at the same level. The unlabeled

arrow is owl:subClassOf.

straints inherent in the P3P speciﬁcation. Once an on-

tology has been deployed, any P3P policy can be ver-

iﬁed against this ontology with the help of an OWL

reasoner. Our aim is to be able to verify whether a

given policy is valid; and if not, what is wrong.

As shown in Sec.2, a policy consists at least one

Statement

, which in turn comprises several elements

e.g.

Data

Purpose

Recipient

, and

Retention

. Note

that we focus only on these four elements for clarity

of discussion.

An obvious modeling choice is to deﬁne a class

for each of these elements and relate them with appro-

priate properties/roles. To make sure that the purpose

for one data is not grouped with another data, we pro-

pose here to ﬂatten original P3P statements such that

each resulting reiﬁed statement has exactly one

Data

and one

Purpose

. The class Data represents any data

item per se, whereas an additional class (Collected-

Data) represents those data collected by a policy for

some purposes. Due to our proposed data–purpose

centric model where the purpose of the collected data

is considered important for data practices, we also

deﬁne another class (DataPurpose-CollectionPractice) to

represent the purposes for which the data are col-

lected, as shown in Fig.2. The corresponding OWL

deﬁnitions of this model are given by α

–α

in Fig.3.

At the bottom of Fig.3 are role axioms required for

reasoning. The role inclusion axioms ρ

–ρ

and ρ

speciﬁes, respectively, that hasPart is a superrole of

every other role and that it is transitive. The hierar-

chical structures of data in P3P are organized using an

aggregation role hasSubDataStructure, and every leaf

data item relates to their corresponding data category

via another role categorizedIn. This design enhances

the modeling in (Damiani et al., 2004; Hogben, 2005)

by adding the left-identity role inclusion axiom ρ

. In

the presence of this axiom, any category of a sub-data

AN ONTOLOGICAL APPROACH TO VERIFYING P3P POLICIES

351

Policy ⊑ collects some CollectedData

CollectedData ⊑ Data and (collectedForPurpose some DataPurpose-ColPractice)

and (optionality only DataCollectionOptionality)

DataPurpose-ColPractice ⊑ Purpose and (hasRecipient only Recipient) and (hasRetention only Retention)

and (optionality only DataUsageOptionality)

Recipient ⊑ (optionality only DataUsageOptionality)

InvalidPolicy1 ≡ Policy and (hasPart some (DataPurpose-ColPractice and (hasRetention min 2)))

collects ⊑ hasPart ρ

collectedForPurpose ⊑ hasPart

hasRecipient ⊑ hasPart ρ

hasRetention ⊑ hasPart

hasSubDataStructure ⊑ hasPart ρ

optionality ⊑ hasPart

hasPart ◦hasPart ⊑ hasPart ρ

hasSubDataStructure◦ categorizedIn ⊑ categorizedIn

Figure 3: A core extract of the OWL ontology for validity checking of P3P policies.

structure is automatically propagated to its super-data

structure.

In general, constraints shown in the previous sec-

tion can be translated into a logical expression which

then form (part of) a deﬁnition in the ontology. How-

ever, checking constraint violation in any given P3P

policy by this approach is insufﬁcient to explain what

is wrong in the policy. We thus propose to deﬁne

classes (called InvalidPolicy) with speciﬁc deﬁnitions to

represent these constraint violations, instead of spec-

ifying logical expressions directly in the ontology.

This modeling decision enables us not only to de-

tect the policy invalidity but also to know the under-

lying reasons. We deﬁne twelve InvalidPolicy classes

but, due to space limitation, only one is depicted here

as β

in Fig.3. InvalidPolicy1 represents the class of in-

valid policies that have multiple retention values for

the same data–purpose collection practice. Multiple

retention values are captured with the help of at-least

number restrictions. Since the data

#user.login

and

#user.home-info

of the policy in Fig.1 have two reten-

tion values, when we run an OWL reasoner (Hermit

1.3.3 in Prot´eg´e), the policy is inferred as a member

of class InvalidPolicy1.

5 RELATED WORK

A work on formalizing P3P in an ontology (Hogben,

2004) was proposed as a W3C working group note.

This and our work share the ideas of modeling most

P3P entities as concepts (classes of individuals), of

ﬂattening P3P statements, of modeling data nested

structures by an aggregation role instead of the sub-

class relation, and modeling data categories as su-

perclasses. The modeling choice of this work dif-

fers to ours that each policy statement is ﬂattened to

a few reiﬁed statement objects where each describes

a collection practice of a data item. Another sub-

tle difference however remains in the choice between

OWL quantiﬁcations. We reckon that a sensible pol-

icy should describe at least one collection practice of

a data item, so some is chosen instead of only. In

addition, we use roles subDataStructureOf and hasSub-

DataStructure in place of may-include-members-of, which

is rather confusing. The fact that a super-data struc-

ture may or may not include a sub-data structure is

modeled in our ontology using a number restriction.

Damiani et al. (Damiani et al., 2004) and Hogben

(Hogben, 2005) proposed a way to represent P3P-

based data schema in the Semantic Web, focusing on

data schema of P3P 1.0. In these works, data items

are similarly modeled as classes, but they are interre-

lated via three roles, viz. is-a, part-of, and member-of

which is unnecessarily complex and error-prone.

6 CONCLUSIONS

We proposed an ontology model for P3P based on

data–purpose centric view. Several constraints re-

quired to prevent certain semantic inconsistencies

have been identiﬁed and formalized in an OWL on-

tology. Our constraint violation detection are imple-

mented, instead of logical constraint, in such a way

that can capture constraint in OWL classes which can

provide reasons of P3P policy invalidity.

REFERENCES

Bechhofer, S., van Harmelen, F., Hendler, J., Horrocks, I.,

McGuinness, D. L., Patel-Schneider, P. F., and Stein,

L. A. (2004). OWL Web Ontology Language refer-

ence. W3C Recommendation.

Cranor, L. (2003). P3P 1.1 user agent guidelines. P3P User

Agent Task Force Report 23.

Cranor, L., Langheinrich, M., Marchiori, M., Presler-

Marshall, M., and Reagle, J. (2002). The Platform for

Privacy Preference 1.0 (P3P1.0) Speciﬁcation. W3C

Recommendation.

KEOD 2011 - International Conference on Knowledge Engineering and Ontology Development

352

Damiani, E., De Capitani di Vimercati, S., Fugazza, C., and

P.Samarati (2004). Semantics-aware privacy and ac-

cess control: Motivation and preliminary results. In

1st Italian Semantic Web Workshop, Ancona, Italy.

Hogben, G. (2004). P3P using the semantic web (Web on-

tology, RDF policy and RDQL rules). W3C Working

Group Note 3 September 2004.

Hogben, G. (2005). Describing the P3P base data schema

using OWL. In WWW2005, Workshop on Policy Man-

agement for the Web.

Karjoth, G., Schunter, M., Herreweghen, E. V., and Waid-

ner, M. (2003). Amending P3P for clearer pri-

vacy promises. In 14th International Workshop on

Database and Expert Systems Applications. IEEE

Computer Society.

Li, N., Yu, T., and Ant´on (2003). A semantics-based

approach to privacy languages. Technical Report

TR2003-28, CERIAS.

Yu, T., Li, N., and Ant´on, A. (2004). A formal semantics

for P3P. In ACM Workshop on Secure Web Services.

AN ONTOLOGICAL APPROACH TO VERIFYING P3P POLICIES

353