A General Approach to Securely Querying XML

Ernesto Damiani

, Majirus Fansi

, Alban Gabillon

and Stefania Marrara

Universit

a degli Studi di Milano

Dipartimento di Tecnologie dell’Informazione

via Bramante 65 26013 Crema (CR), Italy

Universit

e de Pau et des Pays de l’Adour

IUT des Pays de l’Adour

40 000 Mont-de-Marsan, France

Abstract. Access control models for XML data can be classiﬁed in two major

categories: node ﬁltering and query rewriting systems. The ﬁrst category includes

approaches that use access policies to compute secure user view on XML data

sets. User queries are then evaluated on those views. In the second category of

approaches, authorization rules are used to transform user queries to be evaluated

against the original XML dataset. The aim of this paper is to describe a general

query rewriting technique to securely querying XML. The model speciﬁcation

is given using a Finite State Automata, ensuring generality and easiness of stan-

dardization w.r.t. speciﬁc implementation techniques

1 Introduction and Related Work

In the last few years, the eXtensible Markup Language (XML)[2] has become the format

of choice for data interchange. XML-based systems are now widely deployed in a num-

ber of application ﬁelds. This success has triggered a growing interest in XML security,

and several schemes for XML access control have been proposed. They can be clas-

siﬁed in two major categories: node ﬁltering and query rewriting techniques. The ﬁrst

category includes a number of approaches (e.g., [6, 7,14,13,17, 24,5, 12]; for a com-

plete survey, see [14]) that use access policies to compute secure views on XML data

sets. User queries are then evaluated on those views. Although views can be prepared

off-line, in general, view-based enforcement schemes suffer from high maintenance and

storage costs, especially for a large XML repository.

XML access control via query rewriting ([21,19, 20,10, 18,9, 3]) has been proposed

as a way to remedy these shortcomings. According to this approach, access control

rules are not directly applied to the XML dataset to be protected; rather, they are used

to translate potentially unsafe user queries into safe ones, to be evaluated against the

original XML dataset. Most current proposals translate the policy’s access control rules

(ACR) to nondeterministic ﬁnite automata (NFSA) to rewrite user queries. However,

for policies that include many ACRs, NFSA backtrackings may cause unacceptable

overhead. More importantly, NFSA-based models are not entirely suitable for system

Damiani E., Fansi M., Gabillon A. and Marrara S. (2007).

A General Approach to Securely Querying XML.

In Proceedings of the 5th International Workshop on Security in Information Systems, pages 115-122

DOI: 10.5220/0002417301150122

 SciTePress

speciﬁcation and standardization. Another serious concern is that few of these models

provide users with a safe schema representing the information that they are allowed to

access. Disclosing the original schema may cause unwanted information leaks.

In this paper, we describe our Deterministic Finite Automaton (DFA) based query

rewriting approach (Section 2) that overcomes the drawbacks of the NFA-based Sys-

tems. The main contributions of this work include:

– A security model based on authorization attributes for XML (Section 2.1) in which

the security designer inserts the attributes in the XML Schema of the document

collection via a GUI. We then obtain a policy-dependent view of the schema (or

annotated schema).

– A formalization based on deterministic automata with a high level of generality

(an automaton can be implemented in different ways) and suitable for standardiza-

tion of the enforcement technique. From this formalization we straightforwardly

derive algorithms for computing the user view of the schema (Section 2.2) and the

rewriting DFA (Section 2.3) from the annotated schema.

– A way to exploit the standard operators EXCEPT and UNION of XPath [4] to pro-

duce a sound and complete rewriting procedure (Section 2.4) of the user query.

– The complexity analysis (Section 2.5) shows that the entire procedure is efﬁcient

as it is linear with the size (i.e the number of element deﬁnitions) and the depth of

the repository schema.

A proof that our approach is sound and complete by means of a formal proof of cor-

rectness has been presented in [8]. Finally, section 3 concludes this paper and discusses

future work.

2 DFA-based Query Rewriting

In this section we present a novel approach for rewriting potentially unsafe user queries

into safe ones. Our technique is based on Deterministic Finite Automata (DFA). We

exploit the tree nature of the XML Schema to derive the DFA, which is the core of

the rewriting procedure. We show that our technique is correct by devising its proof of

correctness.

2.1 Writing the Security Policy

The security administrator (SA) uses a Graphical User Interface (GUI) to specify for

each user class (role), the part of information that the users are granted or denied access

to. Indeed, in order to obtain a policy-dependent view of the schema, the SA annotates

the schema using security attributes. This technique was ﬁrst used in SMOQE [11].

We deﬁne the following security attributes: access, condition and dirty.

Attribute access speciﬁes the rights of the user on the node. The value of this at-

tribute is either allow or deny. Attribute condition contains a list of predicates

that have to evaluate to true for access to be granted. Attribute dirty indicates that

some descendants of the current node could be unauthorized. More precisely, a node

has a dirty attribute if it has at least one descendant node with either access=deny

116

<schema xmlns="http://www.w3.org//2001/XMLSchema">

<element name="showroom">

<element name="vehicles" maxOccurs="unbounded" minOccurs="1" >

<element name="available" maxOccurs="unbounded" >

<element name="model" type="string"/>

<element name="color" type="string"/>

<element name="price" type="string"/>

<element name="accessory" maxOccurs="unbounded">

<element name="description" type="string"/>

<element name="price" type="string"/>

</element>

<element name="sold" maxOccurs="unbounded" >

<element name="model" type="string"/>

... 

</element> 

<attribute name="city" type="string" use="required"/> 

</element> 

</schema>

<schema xmlns="http://www.w3.org//2001/XMLSchema">

<element name="showroom" access="allow" dirty="true"> 

<element name="vehicles" maxOccurs="unbounded" minOccurs="1"

access="allow" dirty="true" > 

<element name="available" maxOccurs="unbounded" 

access="allow" dirty="true" condition="C"> 

<element name="model" type="string" access="allow"/>

<element name="color" type="string" access="allow"/>

<element name="price" type="string" access="allow"/>

<element name="accessory" maxOccurs="unbounded"

access="allow" condition="C1"> 

<element name="description" type="string" access="allow"/>

<element name="price" type="string" access="allow"/> 

</element> 

</element>

<element name="sold" maxOccurs="unbounded" access="deny"> 

<element name="model" type="string"/> 

... 

</element> 

<attribute name="city" type="string" use="required"/> 

</element> 

</schema>

(a)

(b)



Fig.1. The Showroom Schema (a) and the corresponding annotated Schema (b).

or a non empty condition attribute attached to it. Annotating the original schema

means appending these attributes to element deﬁnitions in the schema. The annotated

schema is no longer valid regarding W3C XML Schema recommendation. It is only an

internal representation of the security policy that is never disclosed to the user.

Throughout the rest of this paper, we will consider a repository of XML documents

valid w.r.t. the schema depicted in Fig.1(a) as a working example. In this example, we

also consider user Alice and a policy that allows her access to element showroom,

conditionally grants her access to elements available and accessory and denies

access to sold. Alice is granted access to all other elements (except the descendants

of sold of course). The annotated schema is depicted in Fig.1 (b), where security

attributes are written in bold.

The remainder of the rewriting procedure, presented in the remaining subsections,

consists of three steps:

Step 1: The annotated XML schema is transformed according to the policy that

applies to each role. According to her role, the user is provided with the view of the

schema (in short Sv) she is entitled to see. Then, she can write her query using infor-

mation available on Sv. Henceforth, unless stated otherwise, the term view will refer to

the view of the schema and not to the view of a source document.

Step 2: The annotated schema is translated into an automaton which represents the

structure of Sv. Each state within Sv contains some security attributes that will further

serve us while rewriting the user request.

Step 3: The user query is rewritten using the ﬁnite state automaton.

117

<?xml version="1.0"?>

<schema xmlns="http://www.w3.org//2001/XMLSchema">

<element name="showroom">

<complexType>

<sequence>

<element name="vehicles" maxOccurs="unbounded" minOccurs="1">

<complexType>

<sequence>

<element name="available" maxOccurs="unbounded" minOccurs="0">

<complexType>

<sequence>

<element name="model" type="string"/>

<element name="color" type="string"/>

<element name="price" type="string"/>

<element name="accessory" maxOccurs="unbounded" minOccurs="0">

<complexType>

<sequence>

<element name="description" type="string" delete=" "/>

<element name="price" type="string" />

</sequence>

...

</sequence>

<attribute name="city" type="string" use="required"/>

</complexType>

</element>

</schema>

(a)

(b)

access="allow"

dirty

access="allow"

dirty

vehicles

showroom

access="denied"

access="allow"

dirty

sold

available

access="allow"

condition="C1"

modal

color

price

accessory

condition="C"

Fig.2. The User Schema View (a) and the Rewriting FSA (b).

2.2 Deriving the User View of the Schema (Step 1)

Deriving the user viewfrom the annotated schema is straightforward. We start at the root

of the annotated schema tree, and at each element deﬁnition, we proceed as follows:

– If the attribute access is allow without any condition then we keep the node as is

in the user view.

– If access is allow and there is an attribute condition set then we redeﬁne the

node as optional by adding the attribute minOccurs=0. In this way if a query gets

to fail because the conditionis not satisﬁed, then the querist would not infer the

hiding of data.

– If access is deny then we discard the sub-tree rooted at the actual node from the

user view.

The view for user Alice is depicted in Fig.2(a).

2.3 Constructing the Automaton (Step 2)

Constructing the rewriting automaton from the annotated schema is also straightfor-

ward. The automaton M derived from the annotated schema consists of an alphabet Σ,

a set of states S, a transition function T : S × Σ → S, a start state s

∈ S, and a set of

accepting states A ⊂ S. The automaton is constructed as follows:

The alphabet Σ consists of the values of the attributes name of each element deﬁ-

nition on the annotated schema.

118

Creating the states: We start at the root of the annotated schema. The state cor-

responding to the root (element schema) is s

. We create one state for each element

deﬁnition which has a dirty parent. Indeed, all other nodes (those not dirty) and

their subtrees are kept unchanged in the secured view. Hence they do not require to be

processed by the automaton. When we encounter a denied node, we create a state for

that element and skip the entire sub-tree rooted at that node. Each state s ∈ S (s 6= s

)

has attributes which represent the security attributes stated at the corresponding element

deﬁnition. We give to the state attributes the name and the value of their corresponding

security attributes. Each state s ∈ S (s 6= s

) is a ﬁnal state (i.e. A = S \ {s

}).

Deﬁning transitions: There exists a transition from a state s

to a state s

if the

element deﬁnition corresponding to s

is the parent of the element deﬁnition corre-

sponding to s

. The transition is labeled by the attribute name of the element deﬁnition

corresponding to s

The automaton derived from the annotated schema of Fig.1(b) is represented in

Fig.2(b).

2.4 Rewriting the Request (Step 3)

We assume that the user writes her request using the subset XPath--

of XPath ex-

pressions informally deﬁned as follows:

XPath--:= ε|l| ∗ |p

| //p

|p[q] where p

and p

are XPath-- expressions; ε, l, ∗

denote the empty path, a label and a wildcard, respectively; / and // stand for child-axis

and descendant-or-self-axis; and ﬁnally, q is called a qualiﬁer. We rewrite the request in

the subset ζ:={ε|l|p

|p[q]} of XPath-- using the functions union and except. ζ is

XPath-- without descendant-or-self axis (//) and wildcards (∗). Hereby, we alleviate

the rewriting process overhead since there is no need to backtrack in the automaton. We

therefore rewrite the query in two phases. First, we reﬁne the submitted expression and

second, we rewrite the reﬁned expression through the automaton.

Phase 1: reﬁning the expression This step consists in reﬁning the request on the

basis of the view the user is permitted to see. We ﬁrst transform the user query (over

the repository) to an equivalent one (over the view). Second, we execute the latter on

the user view (Sv) and from the target node we go back up to the root node, adding

the encountered nodes on the path to form the reﬁned expression. The goal of this pro-

cedure is to eliminate every // and

within the expression. As an example, if Alice

request is //vehicles/available then the equivalent expression over the view is

//element[@name="vehicles"]/complexType/sequence/

element[@name="available"] and the reﬁned expression is

/showroom/vehicles/available.

Phase 2: Rewriting the request via the automaton The automaton represents the

view the user is permitted to see. Rewriting the user request consists of,

In [15] Gottlob, Koch and Pichler show that the loss of expressive power of a fragment like

XPath-- w.r.t. XPath is minimal.

119

– processing the ﬁrst token

of the reﬁned expression

– moving to the next state of the automaton until either the last token is received,

or a clean state (i.e., a state that has no attribute dirty) is met or a denied state is

encountered.

When processing a token, we consider the two following cases:

– Queries without predicates. After reading the current token, the automaton uses the

attributes of the current state and behaves as follows:

– Access is deny. It rejects the request.

– Access is allow. There are two possibilities:

–– (1) If there is no attribute dirty then the user has the right to consult the entire

sub-tree rooted at that node. The token is kept as such, the value of the attribute

condition (if any) is attached to the token and the remainder of the source query

is appended to the rewritten query. Note that the attribute dirty is for optimizing

the rewriting procedure. Indeed, if the access is allow and if there is no attribute

dirty then we do not need to analyze the remaining tokens one by one. We can

directly append the remainder of the source query to the rewritten query.

–– (2) If there is the attribute dirty then the token is kept as such and if there

is an attribute condition, its content is attached to the token. Then, the analyzer

asks for the next token (if any).

If the last token has been fed into the automaton then we use the operator except

to eliminate each unauthorized node under the target nodes. If q denotes the rewrit-

ten expression after the last token has been fed into the automaton then the ﬁnal

rewritten expression is q

′

= q except (e

∪ e

∪ ... ∪ e

), where each e

with

1 ≤ j ≤ n is computed as follows:

The automaton consults one after another the states corresponding to the children

of the node represented by the current state. At each state s corresponding to the

token l, we have the following:

If the attribute access = deny then l is appended to q. The result q/l becomes

one of the e

If the attribute access=allow and there is an attribute condition then the

negation of the content C of the attribute condition is appended to l. The result

l[not(C)] is appended to q. q/l[not(C)] becomes one of the e

. If there is also an

attribute dirty then the procedure goes deeper into the automaton (i.e. examines

the children of the current token l) and starts computing another e

with q now

being equal to q/l[C].

– Queries with predicates. The idea here is to stop processing the automaton when a

token with predicate(s) is received. We save the current state and check whether the

user has the right to consult the nodes that occur within the predicate(s). If she has

the right to, we return to the saved state and continue with the next token. Otherwise

the request is rejected.

Examples which illustrate the rewriting procedure are provided in [8]. Due to space

limitation, we cannot include them in this paper.

We call token a step in the path expression, for example showroom is the ﬁrst token in

/showroom/vehicles/available, while vehicles is the second. / stands for a

lookahead.

120

2.5 Complexity Analysis

The complexity of our approach is determined by that of steps 1, 2 and 3 of the rewriting

procedure. Let us assume that the repository schema contains n deﬁnitions of element

nodes. Deriving the user view of the schema (Section 2.2) takes at most O(n) time.

Constructing the automaton (Section 2.3) also requires at most O(n) time as well. If m

is the depth of the schema, then reﬁning the expression (Section 2.4) takes O(m) time.

Since we rewrite the reﬁned expressions by simply traversing the deterministic automa-

ton, this phase takes O(n) time. Hence, the overall time complexity of this proposal is

O(n + m).

3 Conclusion

In this paper, we describe a Deterministic Finite Automata (DFA) based approach to

rewrite unsafe queries into safe ones, thus avoiding the many backtrackings inherent

to NFAs. We highlighted how our approach improves w.r.t. previous works in the area

(see [8]). Also, we show that our technique is linear with the size and the depth of the

repository schema. Although our rewriting procedure is theoretically efﬁcient and sug-

gests good performances, experiments remain work to be done. Moreover, our proposal

leaves space for further work. Other inspiring approaches [16], [1] enforce client-based

access control to XML. Indeed, in [16] and [1], the document is encrypted at the server

side and decrypted at the client side. The input of their system is then XML data and

the output is also XML data, while in our approach both the input and output is an

XML query. We are investigating the possibility to diminish the workload at the server

side by transferring the rewriting procedure to the client side. Finally, Current stan-

dards for access control languages that can be used for protecting XML information

([23,22, 25]) lack a standard technique for enforcing policies via secure query rewrit-

ing. We are investigating interfacing our technique with standard policy languages like

XACML[23]. Our DFA-based approach is general enough to specify the enforcement

of most XACML policies when applied to protect XML data. We plan to develop this

topic in a future paper.

Acknowledgements

This work was supported in part by the Italian Basic Research Fund (FIRB) within the

KIWI and MAPS projects, by the European Union within the PRIME Project in the

FP6/IST Programme under contract IST-2002-507591 and by funding from the French

ministry for research under ”ACI S

ecurit

e Informatique 2003 - 2006. projet CASC”.

Majirus Fansi holds a Ph.D scholarship granted by the ”Conseil G

eral des Landes”.

The authors wish to thank Sabrina De Capitani di Vimercati, Pierangela Samarati and

Stefano Paraboschi for common work and valuable suggestions.

References

1. Bouganim L., Ngoc F. D., Pucheral P.: Client-Based Access Control Management for XML

documents. In Proc. of the 30th VLDB Conference, 2004.

121

2. Bray T., Paoli J., Sperberg-McQueen C. M.: eXtensible Markup Language (XML) 1.0 (2nd

Ed). W3C Recommendation, 2000

3. Byun C. W., Park S.: An Efﬁcient Yet Secure XML Access Control Enforcement by Safe

and Correct Query Modiﬁcation. In Proc. of the 17th International Conference on Database

and Expert Systems Applications (DEXA), 2006.

4. Clark J., DeRose S.: XML Path Language (XPath). W3C Recommendation, 1999.

http://www.w3.org/TR/xpath.

5. Cuppens F., Cuppens-Boulahia N., Sans T.: Protection of relationships in xml documents

with the xml-bb model. In Proc. of ICISS2005.

6. Damiani E., De Capitani di Vimercati S., Paraboschi S. Samarati P.: Securing XML Doc-

uments. In Proc. of the 2000 International Conference on Extending Database Technology

(EDBT2000).

7. Damiani E., De Capitani di Vimercati S., Paraboschi S., Samarati P.: A ﬁne-grained access

control system for XML documents. In ACM Trans. Inf. Syst. Secur., Vol. 5(2). ACM Press,

New York (2002) 169–202.

8. Damiani E., Fansi M., Gabillon A., Marrara S.: A General Approach to Securely Query-

ing XML. In Note del Polo - Ricerca - Universit

a degli Studi di Milano, Dipartimento di

Tecnologie dell’Informazione Polo Didattico e di Ricerca di Crema, No. 102, 2007.

9. De Capitani di Vimercati S. and Marrara S. and Samarati P.: An access control for querying

xml data. In Proc. of SWS05 workshop.

10. Fan W. and Chan C. and Garofalakis M.: Secure XML Querying with security views. In Proc.

of SIGMOD 2004 Conference.

11. Fan W., Geerts F., Jia X. Kementsietsidis A.: SMOQE: A System for Providing Secure Ac-

cess to XML. In Proc. of the 32nd VLDB Conference, 2006.

12. Finance B., Medjdoub S., Pucheral P.: The Case for access control on xml relationships. In

Proc. of CIKM 2005.

13. Gabillon A., Bruno E.: Regulating Access to XML documents. In Proc. of the 15th Annual

IFIP WG 11.3 Working Conference on Database Security, 2001.

14. Gabillon A.: A formal access control model for XMl databases. In Proc. of the 2005 VLDB

Workshop on Secure Data Management (SDM).

15. Gottlob G., Koch C., Pichler R.: The Complexity of XPath Query Evaluation. In Proc. of

the 22nd ACM SIGACT SIGMOD SIGART Symposium on Principles of Database Systems

(PODS-02). ACM Press, San Diego (2003)179–190.

16. Kodali N., Wijesekera D.: Regulating access to SMIL formatted pay-per-view movies. In

Proc. of the 2002 ACM workshop on XML security.

17. Kudo M., Hada S.: XML document security based on provisional authorization. In Proc. of

ACM CCS 2000.

18. Kuper G., Massaci F., Rassadko N.: Generalized xml security views. In Proc. of the 10th

SACMAT, 2005.

19. Luo B., Lee D., Lee W., Liu P.: QFilter: Fine-Grained run-time XML Access Control via

NFA-based Query Rewriting. In Proc. of CIKM 2004.

20. Mohan S., Sengupta A., Wu Y., Klinginsmith J.: Access Control for XML - a dynamic query

rewriting approach. In Pro c. of VLDB 2005 Conference.

21. Murata M., Tozawa A., Kudo M.: XML Access Control using Static Analysis. In Proc. of

CCS 2003.

22. NIST, The Extensible Conﬁguration Checklist Description Format (XCCDF),

http://nvd.nist.gov/scap/xccdf/xccdf.cfm

23. OASIS, eXtensible Access Control Markup Language (XACML), http://www.oasis-

open.org/committees/xacml/

24. Stoica A., Farkas C.: Secure XML Views. In Proc. of the 16th IFIP WG11.3 Working Con-

ference on Database and Application Security, 2002.

25. W3C, Web Services Policy 1.2 - Framework (WS-Policy),

http://www.w3.org/Submission/WS-Policy/

122