DG-Query: An XQuery-based Decision Guidance Query Language

Alexander Brodsky

, Shane G. Halder

and Juan Luo

Department of Computer Science, George Mason University, 4400 University Drive, Fairfax, VA 22030, U.S.A.

Information Technology Unit, George Mason University, 4400 University Drive, Fairfax, VA 22030, U.S.A.

Keywords: DG-Query, XQuery, Mathematical Programming.

Abstract: Decision optimization is broadly used for making business decisions such as those for finding the best

production planning in manufacturing. An optimization model may indicate the total cost of a certain supply

chain given the various sourcing and transportation options used; the corresponding optimization problem

can be to select among all possible sourcing and transportation options to minimize the total cost.

Optimization modelling requires considerable mathematical expertise and effort to generate effective

models. Additionally, the optimization process is heavily dependent on data. However, optimization

languages such as IBM’s ILOG CPLEX OPL and Bell Laboratories’ AMPL, do not provide native support

for manipulation of XML data. On the other hand, XQuery is a language for querying and manipulating

XML data, which has become a ubiquitous standard (W3C) for data exchange between organizations;

although, XQuery has no decision optimization functionality. To resolve this gap, this paper proposes DG-

Query, an XQuery-based Analytics Language that seamlessly merges the XML data transformation and

decision optimization capabilities. This is accomplished by first annotating existing XQuery expressions to

precisely express the optimization semantics, and second to translate the annotated queries into an

equivalent mathematical programming (MP) formulation that can be solved efficiently using existing

optimization solvers. This paper presents DG-Query with an example, provides its formal semantics, and

describes implementation through a reduction to MP formulation.

1 INTRODUCTION

In the age of data explosion, business intelligence

analytics tools have been developed to provide users

with the ability to gain insights about their business.

Data analytics tasks can be classified into three main

groups of (1) descriptive, (2) predictive and (3)

prescriptive analytics. Descriptive analytics involves

the manipulation and integration of large streams of

data, using tools similar to database query

languages. Predictive analytics acquires insights into

the data using techniques of data mining and

statistical learning. Prescriptive analytics deals with

prescribing users actionable recommendations on

how to move a (business) system towards an optimal

outcome. This task is typically implemented using

decision optimization tools.

Applications of decision optimization include

deciding on business transactions within supply

chains, stipulating environment policies which are

aimed at public welfare, and finding the best

response in an emergency. For example, given a

repository of manufacturing plants and raw material

suppliers, a business owner will need to decide on

production planning and sourcing, namely, which

products and in what quantities should be

manufactured and where raw materials should be

purchased from in order to maximize the total

profitability.

Implementing solutions to such problems

involves Mathematical Programming (MP) and/or

Constraint Programming (CP), and using software

packages called Optimization (MP or CP) Solvers.

Optimization problems for both MP and CP are

expressed by providing decision variables that range

over some domain (e.g. reals, integers, binary, finite

domain), constraints (e.g. arithmetic equations and

inequalities over the decision variables), and the

optimization objective (e.g. cost, revenue, profit,

time) that needs to be minimized or maximized.

Solving an optimization problem (either MP or CP)

involves finding a value for each decision variable

from its domain, in such a way that the optimization

objective is minimized or maximized as required,

while all constraints are satisfied (IBM, 2013).

152

Brodsky A., G. Halder S. and Luo J..

DG-Query: An XQuery-based Decision Guidance Query Language.

DOI: 10.5220/0004868201520163

In Proceedings of the 16th International Conference on Enterprise Information Systems (ICEIS-2014), pages 152-163

ISBN: 978-989-758-027-7

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

While MP is geared more toward problems with

numeric variables (over reals and/or integers), CP is

mostly used for combinatorial optimization

problems (where some variables range over finite

domains), which are typical in applications for

planning and scheduling (IBM, 2013). For both MP

and CP, optimization modelling languages such as

OPL (IBM, 2011), AMPL (AMPL, 2013), and

GAMS (Boisvert, R.F., Howe, S.E., and Kahaner,

D.K. Kahaner, 1985), are used to formulate decision

variables, constraints and the objective function.

While MP and CP are most suitable technologies

for finding high-quality solutions to optimization

problems efficiently, the development of MP/CP

models requires significant effort and mathematical

expertise (in Operations Research) that most

database application developers and data analysts do

not have. Furthermore, the resulting models are

typically not modular, extensible or reusable.

On the other hand, to support descriptive (as

opposed to prescriptive) analytics tasks, database

query languages such as XQuery can be easily used,

which operates on XML (Extensible Markup

Language) data. XML, designed by the World Wide

Web Consortium (W3C), has become a standard for

data exchange between organizations (W3C, Jan.

2012). Not only is data easy to represent in XML,

XML is a self-describing language with the

flexibility to define complex data structures.

Furthermore, it has mechanisms to express

constraints on the structures and contents of XML

documents, such as XML schemas, DTDs and

Schematron, which allows for rule-based validation

regarding the detection of patterns in an XML

document (Rick Jelliffe and Academia Sinica

Computing Centre, 2002). The XML language is not

complicated to use as a result of its simple constructs

(W3C, Sep. 2006). Considering the widespread use

of XML for data storage and exchange, it is no

longer a nice feature to have but a necessity for any

tool that is data driven. Not only should optimization

modelling software support XML as a data source,

the software should also provide an easy mechanism

for querying XML data.

XQuery is an appropriate tool for the job.

XQuery is also designed by W3C, and its language

syntax for querying XML data is very similar to

SQL for querying a relational database, making it

easier to learn if the user already has knowledge of

SQL. While XQuery is a fully-featured language, the

FLWOR (For, Let, Where, Order by, and Return)

expression provides an elegant way to query and

manipulate XML data (W3C, Dec. 2010). Moreover,

XQuery and XML are languages that are easy to

learn and use by database application developers and

data analysts. However, XML/XQuery does not

support decision optimization.

Bridging the gap between the efficiency of

optimization algorithms based on MP/CP and the

ease of use by database application developers and

data analysts using XML/XQuery is exactly the

focus of this paper. We propose DG-Query, an

XQuery-based Decision Guidance Query Language,

which allows building optimization models by

writing or reusing existing XQuery code/programs

with minor annotations for optimization, thus

making the language easy to use by database

application developers or data analysts.

Seemless integration of the decision optimization

models with XQuery programs presents a unique

challenge. The reason for this is that optimization

models declaratively express decision variables,

constraints and the optimization objective, while

XQuery programs are written as forwardly executed

computation. We would like to avoid the direct

encoding of optimization models, e.g. in XML,

because this would create an impedance mismatch.

Instead, the idea of DG-Query is to annotate

XQuery programs with non-deterministic variables

to indicate that, intuitively, some values in the

computation are unknown, and should be determined

by the system, in such a way that a designated value

(computed by the XQuery) be optimized subject to

Boolean assertions (constraints), which are also

added as program annotations.

The technical problem we need to overcome is to

automatically translate DG-Query programs, with

their non-deterministic semantics, into formal

optimization models, expressed as MP or CP

problems. The MP/CP problems are then solved to

generate a solution to the optimization problem. The

optimization solution provides values for non-

deterministic variables, which makes the XQuery

computation deterministic and allows an answer to

be produced.

DG-Query is designed to extend the prevalent

XQuery language with minimal annotation. As a

result, we believe that DG-Query could be easily

adopted by database application developers and data

analysts especially if they are already familiar with

XQuery. In summary, the contributions of this paper

are:

 We introduce DG-Query, an XQuery-based

analytics language for decision optimization and

define its formal semantics

 We provide a reduction method to automatically

transform DG-Query programs into formal MP

DG-Query:AnXQuery-basedDecisionGuidanceQueryLanguage

153

models, which can be solved by ILOG CPLEX

solver

 We describe the implementation architecture of

DG-Query

The rest of this paper is organized into 7

sections. In Section 2, we give a motivating running

example. In Section 3, we briefly discuss the

challenges of decision optimization, and discuss a

few related works. Before we present the formal

syntax and semantics of DG-Query, in Section 4, we

use the running example to present DG-Query

informally in an XQuery like syntax. We then give

the formal syntax and semantics of DG-Query in

Section 5. We present our implementation prototype

and architecture in Section 6, discuss our experiment

design in Section 7 and conclude with Section 8.

2 A RUNNING EXAMPLE

To make our discussion concrete, consider a simple

example of decision guidance to support a small

manufacturing network. Company X owns multiple

manufacturing plants, building products to be sold.

The raw materials necessary to build a product are

provided by various suppliers. Each supplier has a

limited quantity for the various raw materials. The

company must stay below its overall raw material

purchasing budget while maximizing its profit. The

decisions that Company X needs to make are:

 The quantity of products each manufacturing

plant should build

 The quantity of raw materials to be purchased

from the various suppliers

Before explaining how DG-Query solves this

problem, we describe a pure XQuery based solution

in this section. The XML data representing the

manufacturers, suppliers, and product parts are given

below respectively. The XML schema definitions for

manufacturers, suppliers, product parts are given in

the Appendix.

Manufacturers XML Data:

<product mid='manuf1'

prodid='product1'

name='table'

ppu='100.00' />

<product mid='manuf2'

prodid='product1'

name='table'

ppu='120.00' />

...

/products>

</manufacturers>

The manufacturers XML data represents the

manufacturing network. Each product has an

associated manufacturer identifier ‘mid’, product

identifier ‘prodid’, product description ‘name’, and

the price to be sold per unit ‘ppu’.

Suppliers XML Data:

<parts>

<part sid='supp1'

partid='part1'

name='wooden leg'

cpu='2.50'

supply='10' />

<part sid='supp2'

partid='part1'

name='wooden leg'

cpu='3.00'

supply='5' />

...

</parts>

</suppliers>

The suppliers XML data expresses the supply of

raw materials available for purchase by

manufacturers. Each part has an associated supplier

identifier ‘sid’, part identifier ‘partid’, part

description ‘name, the cost per unit ‘cpu’, and the

available quantity ‘supply’.

Product Parts XML Data:

<productPart prodid='product1'

partid='part1'

qty='4' />

<productPart prodid='product2'

partid='part1'

qty='4' />

...

</productParts>

The product parts XML indicates the

composition of a product by specifying the quantity

of raw materials, i.e. parts needed to produce a piece

of each product. The attribute ‘prodid’ corresponds

to a manufacturer’s product, ‘partid’ corresponds to

a supplier’s part, and ‘qty’ indicates the number of

parts needed for the associated product.

With all the information given in the

manufacturers, suppliers, and product parts XML

data, we are trying to decide on the quantity of

products for each manufacturer to produce and from

which supplier to purchase the required parts, such

that the objective, the total profit of manufacturers,

will be maximized. The XQuery variable

$manufItems_index (see below) is defined in the

XQuery program to represent the quantity of each

product to be produced by the manufacturer. In the

variable $manufItems_index, the quantity is

represented by the attribute ‘qty’, and the ‘prodid’

and ‘mid’ associate the product back to the original

manufacturers XML. We have assigned a value to

the quantities for each product. However, we need

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

154

to keep in mind that the set of quantities required to

generate the maximum profit may not be known in

advance and may be too complex to compute

manually in real life.

(: Manufacturer Product Quantity :)

let $manufItems_index :=

<product prodid='product1'

mid='manuf1'

qty='0' />

...

<product prodid='product3'

mid='manuf1'

qty='8' />

...

</products>

Another XQuery variable $suppItems_index is

defined to represent the quantities of raw materials,

i.e. parts, purchased by manufacturers from each

supplier. The quantity is represented by the attribute

‘qty’, and ‘partid’ and ‘sid’ associate the part back to

the original suppliers XML. From the perspective of

maximizing the total profit, the quantity of each

purchased part from the suppliers is another decision

to be made by the manufacturers. We assign a value

for the quantities of parts but similarly, the set of

quantities may not be known in advance in real life.

(: Supplier Part Quantity :)

let $suppItems_index :=

<parts>

...

</parts>

To calculate the total profit for manufacturers,

the following XQuery variables are defined. The

variable $budget represents the total budget

available for purchasing parts from suppliers. The

variable $total_cost represents the total

manufacturing cost of all manufacturers. The

variable $total_revenue represents the total revenue

of manufacturers by selling the products. The

variable $total_profit represents the difference

between the $total_revenue and $total_cost. The

objective of this manufacturing model is to

maximize the total profit, i.e. $total_profit.

(: Total budget available :)

let $budget as xs:decimal := 1000.00

let $total_cost as xs:double :=

sum (

for $s in $allSuppliers//supplier,

$p in $allParts//part

return

$suppItems_index//part[@sid=$s/@sid

and @partid=$p/@partid]/@qty

* $suppliers//part[@sid=$s/@sid

and @partid=$p/@partid]/@cpu)

let $total_revenue as xs:double :=

sum (for $m in

$allManufacturers//manufacturer,

$p in $allProducts//product

return

$manufItems_index//product[@mid=$m/@mid

and @prodid=$p/@prodid]/@qty *

$manufacturers//product[@mid=$m/@mid

and @prodid=$p/@prodid]/@ppu )

let $total_profit as xs:double :=

$total_revenue - $total_cost

To calculate the $total_profit, some intermediate

variables are defined as follows for convenience.

(: Unique manufacturer ids :)

let $allManufacturers :=

{

for $id in distinct-

values($manufacturers//product/@mid)

return <manufacturer mid='{$id}' />

}

</manufacturers>

(: Unique supplier ids :)

let $allSuppliers :=

{

for $id in distinct-

values($suppliers//part/@sid)

return <supplier sid='{$id}' />

}

</suppliers>

(

: Unique product ids :)

let $allProducts :=

{

for $id in distinct-

values($manufacturers//product/@prodid)

return <product prodid='{$id}' />

}

</products>

(

: Unique part ids :)

let $allParts :=

<parts>

{ for $id in distinct-

values($suppliers//part/@partid)

return <part partid='{$id}' />

}

</parts>

(: Expression: Qty Per Product Produced :)

let $producedqty :=

{

for $p in $allProducts//product/@prodid

return

<product prodid='{$p}'

qty='{

sum($manufItems_index//product[@prodid=$p]/@

qty)

}'/>

}

</products>

(: Expression: Quantity Per Part Required :)

DG-Query:AnXQuery-basedDecisionGuidanceQueryLanguage

155

let $reqqty :=

<parts>

{

for $p in $allParts//part/@partid

return

<part partid='{$p}' qty='{

sum($suppItems_index//part[@partid=$p]/@qty)

}'/>

}

</parts>

The XQuery variables $allManufacturers,

$allSuppliers, $allProducts, and $allParts store a

sequence of elements containing attributes

corresponding to the unique identifiers of

manufacturers, suppliers, products, and parts

respectively. The variable $producedqty represents a

sequence of elements containing attributes

corresponding to a unique product and the total

quantity of that product produced over the entire

manufacturing network. And finally the variable

$reqqty stores a sequence of elements containing

attributes corresponding to a unique part and the

total quantity of that part purchased from all the

suppliers.

In this manufacturing network, we also want the

system to enforce some business rules for the

production process, e.g. the quantities of products

produced by manufacturers must be greater than or

equal to zero, and parts purchased by manufacturers

from suppliers must also be greater than or equal to

zero, and less than or equal to the quantity available

by the suppliers. Additionally, the total cost of

manufacturing must be less than or equal to the

specified budget. The result of the business rules is

represented by boolean variables with values of

either ‘True’ or ‘False’. The first two variables,

$producedqty and $reqqty, are intermediate ones for

business rule definitions. The translation of these

business rules is straightforward and will be

explained in Section 4.

(: Business Rule: Produced Quantity >= 0 :)

let $producedProductsGEZero as xs:boolean :=

every $p in $manufItems_index//product

satisfies xs:integer($p/@qty) ge 0

(: Business Rule: Purchased Parts >= 0 :)

let $purchasedPartsGEZero as xs:boolean :=

every $p in $suppItems_index//part

satisfies xs:integer($p/@qty) ge 0

(: Business Rule: Purchased Parts >=

Required Parts :)

let $purchasedPartsGERequiredParts as

xs:boolean :=

every $p in $reqqty//part satisfies sum(

$suppItems_index//part[@partid=$p/@partid]/@

qty ) ge xs:integer($p/@qty)

(: Business Rule: Part Supply Per

Supplier >= Purchased Parts :)

let $partSupplyGEPurchasedParts as

xs:boolean :=

every $s in $suppliers//part

satisfies $s/@supply ge

$suppItems_index//part[@partid=$s/@partid

and @sid=$s/@sid]/@qty



(: Business Rule: Total Cost <= Budget :)

let $totalCostLEBudget as xs:boolean :=

$total_cost le $budget

The result of the XQuery program is the total

cost, total revenue, and total profit generated based

on the products produced and parts purchased, and

the boolean values of the defined XQuery business

rules and variables indicating whether the rules are

satisfied. The result is returned in the format of a

regular XML document.

Result XML:

{

$producedProductsGEZero

}

</producedProductsGEZero>

{

$purchasedPartsGEZero

}

</purchasedPartsGEZero>

. . . . . .

{

$totalCostLEBudget

}

</totalCostLEBudget>

<total_cost>

{

$total_cost

}

</total_cost>

<total_revenue>

{

$total_revenue

}

</total_revenue>

<total_profit>

{

$total_profit

}

</total_profit>

</result>

3 RELATED WORK

DG-Query application development requires both

XML/XQuery and Operations Research (OR)

solutions. Difficulties arise when using OR tools

(MP or CP) alone. The designing of the OR models

requires full knowledge of the search space and the

objective for the task of efficient system operations.

The OR modelling abstraction (e.g., OPL (IBM,

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

156

2011) and GAMS (Boisvert, R.F., Howe, S.E., and

Kahaner, D.K. Kahaner, 1985)) typically requires

OR expertise, a skill that database/software

developers may not have. Alternatively, XML and

XQuery tools are more intuitive to IT professionals

and have been widely used as a standard information

exchange file format among heterogeneous systems,

under multiple platforms. However, XQuery

languages are not designed for decision optimization

as they cannot express decision optimization

problems, and further solve them. For continuous

decision variables, uncountable possibilities of

values to choose are obviously beyond the

expressiveness of XQuery languages. For discrete

cases, if the search space is large, it will be still

inefficient for XQuery to try every possible discrete

case. Its evaluation algorithms have not taken

advantage of MP and CP search strategies to achieve

potential efficiency and flexible optimization goals.

Optimization modelling software is heavily

dependent on data, and a lack of native support for

XML or XML processors requires that data stored or

transmitted as XML be converted to a supported

format. IBM’s ILOG CPLEX Optimization

Programming Language (OPL) software provides

interfaces to various data sources including database

support but not XML (IBM, 2011). Bell

Laboratories’ AMPL software falls into the same

category as IBM’s OPL, no XML support (AMPL,

2013). However, Maplesoft’s Maple software does

include an interface for XML. Maple incorporates an

Extensible Stylesheet Language Transformations

(XSLT) engine used to process XML data

(Maplesoft, 2013). Although XSLT was created by

W3C, its language constructs are verbose and

complex, which could pose a challenge for use by

OR analysts.

The language CoJava (Brodsky, A. and Nash H.,

2006) offers both simulation of process modelling in

Java, and the capabilities of true decision

optimization. The syntax of CoJava is identical to

Java but with a few special construct, i.e., definition

of decision variables of a numeric value, the

assertion of constraints and the designation of a

variable as the objective to be optimized. The

semantics of CoJava interprets the program as an

optimal nondeterministic execution path. The

constraints are part of the procedure. The procedure

of the CoJava language can represent numerous

modules of an entire model, or even larger, an entire

software system. Running a CoJava program

involves first finding an optimal execution path and

then procedurally executing it.

Reporting applications over databases are

intuitive and have long been established using the

mature database query technology. A Decision

Guidance Query Language framework (Brodsky A.,

Egge N. and Wang X.S., 2011)( Brodsky, A., Bhot,

M.M., Chandrashekar, M., Egge, N.E., and Wang,

X.S., 2009)( Brodsky, A., Egge, N.E., and Wang

X.S., 2012) proposed a query language which was

built on the Structured Query Language (SQL), and

also included optimization functionality in it. It

annotates existing SQL queries to precisely express

the optimization semantics. It then translates the

annotated SQL queries into equivalent MP, which

can be solved efficiently. However, DGQL only

supports the traditional relational database and does

not support XML, a data format that gives more

flexibility for data structures and is easier to be

deployed in support of multiple platforms.

4 DG-QUERY VIA EXAMPLE

In this section, we use the existing XQuery

statements in our manufacturing network example

shown in Section 2 to illustrate the DG-Query

language. We will discuss the DG-Query formal

syntax and semantics in the next section.

Taking a pure XQuery program, only a few

annotations are needed for software/database

developers to re-write the XQuery program as a DG-

Query program for decision optimization. In

summary the annotations are (i) index set

definitions; (ii) ‘insert’ statements with ‘?’; (iii)

objective (maximize / minimize); and (iv)

constraints.

Continuing with the example in Section 2, we

assume that the XQuery variables, supplier part

quantity $suppItems_index, and the manufacturer

product quantity $manufItems_index, are decision

variables to be determined, with the objective of

maximizing the total profit described by the variable

$total_profit. Instead of the given values for the

XML attribute ‘qty’ for both XQuery variables

$suppItems_index and $manufItems_index, we

replace them with the following DG-Query

statements.

(:Index set for all manufacturers’

products :)

let $manufItems_index as index(@mid,

@prodid) on $manufacturers :=

{ $manufacturers//product }

insert @qty as xs:integer on

$manufItems_index := ?



(: Index set for all suppliers’ parts :)

let $suppItems_index as index(@sid, @partid)

DG-Query:AnXQuery-basedDecisionGuidanceQueryLanguage

157

on $suppliers := { $suppliers//part }

insert @qty as xs:integer on

$suppItems_index := ?

The index(@attribute1, @attribute2, …,

@attributeN) type is a special construct of DG-

Query used to build an index set from a sequence of

elements, where each element represents an object

(i.e. a product) of the same type, as specified by an

XPath expression over an XML document. The keys

for the index set are indicated by the attributes

@attribute1, @attribute2. …, @attributeN provided.

The index set makes all of the attributes in the

sequence of elements accessible to the optimization

process.

The ‘insert @qty as …’ statement is another

special construct of DG-Query that performs two

operations: 1) For the optimization process, the

attribute specified is marked as a decision variable to

be solved; 2) After the optimization has completed,

the solution for each decision variable is inserted as

an attribute back into the source XML. Intuitively,

with any assignment of nonnegative values to the

attribute ‘qty’, there will be a total profit value given

by the variable $total_profit. The task now is to

automatically generate an assignment that gives the

maximum profit for our manufacturing network

example. We need to specify the objective explicitly

as required by the optimization solver. The variable

of $total_profit is calculated as the difference

between the $total_revenue and $total_cost. The

variable $total_revenue is calculated as the sum of

cost per unit times quantities over the item set of

$suppItems_index. The variable $total_cost is

calculated as the sum of price per unit times

quantities over the item set of $manufItems_index.

We also defined a constraint, which is used to

specify that $total_cost should be less than or equal

to the overall budget indicated by $budget. Each

constraint corresponds to a business rule specified in

the pure XQuery program. The constraints assert a

condition to be satisfied by the expressions.

constraint $producedProductsGEZero

constraint $purchasedPartsGEZero

constraint $purchasedPartsGERequiredParts

constraint $partSupplyGEPurchasedParts

maximize $total_profit

Given these annotations, DG-Query will parse

the XQuery program and generate the necessary

optimization model as well as convert the XML data

into the appropriate format expected by the solver.

After the solver has found a solution, as long as one

is feasible, the values are written back into the

source XML as shown below:

Updated Manufacturers XML data:

<product qty="0"

mid="manuf1"

prodid="product1"

name="table"

ppu="100.00"/>

...

<product qty="18"

mid="manuf1"

prodid="product3"

name="plate"

ppu="10.00"/>

...

</products>

</manufacturers>

Updated Suppliers XML data:

<parts>

<part qty="10"

sid="supp1"

partid="part1"

name="wooden leg"

cpu="2.50"

supply="10"/>

<part qty="2"

sid="supp1"

partid="part2"

name="wooden table top"

cpu="70.00"

supply="10"/>

...

</parts>

</suppliers>

Note that the attribute ‘qty’ was not originally

present in the source XML for either manufacturers

or suppliers, and was added as a result of the

optimization process.

5 FORMAL SYNTAX AND

SEMANTICS

The DG-Query language is designed to extend the

functionality of XQuery with annotations that can be

used by an optimizer to solve decision optimization

problems. The sections to follow describe the DG-

Query annotations in detail.

5.1 Annotation Syntax

5.1.1 Index Sets

let $variable_name as index(@a1, ..., @a2)

on $source_xml_variable_name := {

XPath_Expression

}

Index sets are used by the optimization process

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

158

to index integer or float arrays. They are also used to

access XML element attributes that are required in

the mathematical computation for the model. The

XQuery variable representing an index set is linked

to another XQuery variable containing the source

XML over which to build the index. Both the index

set variable and the source variable must be declared

in the outermost scope of the program. Additionally

the XML contained by the source variable must be

well-formed. When using the index type annotation

you must also specify the attributes that represent

the unique key for identifying a particular element in

the generated index. The XPath expression retrieves

the sequence of elements of the same type from the

source variable to be used for the index set.

5.1.2 Insert Expression

insert @attribute_name as [xs:integer |

xs:double]on $index_variable_name := ?

The insert expression is used by the optimization

process to identify the variable that needs to be

solved by the optimization solver. Note that the

attribute name should not already exist in the source

XML referred to by the index variable because it is

inserted into the source XML after a solution has

been found. The index set variable referenced must

be declared beforehand. The attributes to be

inserted/solved for can only be of integer or double

types. The question mark assignment identifies that

the value for the attribute is unknown and must be

solved for during the optimization process.

5.1.3 Objective

[minimize|maximize] $math_expr_variable_name

The objective is a mathematical representation of

the problem to be optimized. The keywords

‘maximize’ and ‘minimize’ are used to define the

objective for the program. The variable name

provided must contain a mathematical representation

of the problem to be solved. The mathematical

representation may be composed of previously

declared intermediate expressions.

5.1.4 Constraints

constraint $boolean_rule_variable_name

Constraints are boolean expressions that define

assertions about the data. The boolean expression

must evaluate to true in order for the model to be

solvable. The keyword ‘constraint’ is used to define

a constraint annotation. The boolean variable

referenced in the constraint must be previously

declared. The optimization solver uses constraints to

impose bounds on the solution.

5.2 Annotation Semantics

We now turn to define the semantics of DG-Query

statements. As we described in previous sections,

DG-Query is built upon XQuery, as it is a language

extension of XQuery. We assume that XQuery

semantics are kept and well understood. In addition

to the semantics of XQuery, we formally define the

aspect of DG-Query which is unique.

First we define the annotations of the DG-Query

program. It contains any number of either index sets,

‘insert’ expressions with ‘?’ assigned to it,

constraints, and objectives. The constraints in DG-

Query are equivalently a set of boolean variables

defined in an XQuery program. Another assumption

we claim is that the index set structure must be

defined on an XQuery variable which has been

defined on the outermost scope of XQuery

programs. In this case, the index set will be assigned

and evaluated correctly.

DG-annotation::= index set

| insert statement with ‘?’

| constraint

| objective (minimize or maximize)

;

The input for a DG-Query program is a set of

XML schemas S

, i = 1, 2, …, n, such as manuf.xsd,

supplier.xsd, and productparts.xsd. The instances of

XML schemas, d

, i = 1, 2,…,n, are regular XML

documents, with the only exception that some

values of attributes in XML documents are identified

as missing, i.e., “?”. The output of DG-Query

optimization is an instance set d

’

, i = 1, 2, …, n,

with all values populated for attributes.





≔ , ,    

≔



,



,…,





   ≔ 



,



,…,





A set of functions are defined as follows. The

function f

represents the objective function. This

function maps from the independent variables, i.e.,

the values of attributes sets, which are annotated

with missing values ‘?’ to a real value output

variable, i.e., the objective of DG-Query

optimization. The function set f

, j = 1, 2, …, k,

defines mapping relationships between input

variables, i.e., the values of attributes sets, which are

annotated with missing values ‘?’, to a Boolean

output variable. We assume there are k constraints

defined in the overall DG-Query program. Each

function f

defines the mapping for the corresponding

DG-Query:AnXQuery-basedDecisionGuidanceQueryLanguage

159

constraint j.











′

,



′

,…,



′













′

,



′

,…,



′





…











′

,



′

,…,



′



We use the set All_I to represent all possible

instantiations of S

, i = 1, 2, …, n. Another set

Feasible_I represents the possible instantiations of

, i = 1, 2, …, n, which belongs to All_I and at the

same time, satisfy all constraints j, j = 1, 2, …, k.

Finally, two sets Min_I or Max_I are defined to

represent the possible instantiations of S

, i = 1,

2, …, n, which belongs to Feasible_I and at the same

time, the value of objective function f

over these

specific instantiations is less than/greater than any

other possible instantiations of Feasible_I.

_



,1,2,…,



′

,

1,2,…,|



′







∧



′





,

1,2,…,

_



,1,2,…,



′

,

1,2, … ,  ∈ _|







′

, 1,2,…,∧









′

,1,2,…,∧…∧







′

, 

1,2, … ,   

_



,1,2,…,



′

,1,2,…,

∈_|∀



,1,2,…,

∈_







,

1,2,…,









′

,1,2,…,

_



,1,2,…,



′

,1,2,…,

∈_|∀



,1,2,…,

∈_







,

1,2,…,









′

,1,2,…,

Finally, we define the answer to the DG-Query

program is an instantiation

nforiIMaxorIMind

,...,2,1__



6 IMPLEMENTATION

PROTOTYPE AND

ARCHITECTURE

6.1 Implementation Prototype

A prototype DG-Query compiler has been

implemented in Java using the following tools:

 ANTLR (Another Tool for Language

Recognition)

 BaseX XQuery 3.0 Processor

 IBM CPLEX OPL (Optimization Programming

language) Java API

The ANTLR and BaseX libraries are written in

pure Java allowing for seamless platform portability;

however the IBM CPLEX OPL Java API requires

native libraries so the corresponding platform

specific software must be installed in order to run

the compiler. The input into the compiler is an

XQuery program annotated with DG-Query. The

DG-Query compiler only contains a subset of the

XQuery language and some syntax has been added

to allow for more concise mathematical

expressiveness.

6.2 Implementation Architecture

The key construction of DG-Query is an extension

of the widely used XQuery language in two

successive levels: Level 1 provides the ability to

introduce non-deterministic variables (and their

constraints) in an intuitive manner. The key

construct is the type of DG-Query statement

“insert @qty as xs:integer on

$manufItems_index := ?”, which essentially

defines a decision variable, which is to be

instantiated by solving a decision problem, and

specifies constraints to be satisfied. After the

definition, the user can refer to the variable in the

conventional XQuery manner as if it has already

been instantiated. Level 2 allows users to define an

objective function over the decision variables using

the construct “maximize/minimize

$objective”, with the objective value optimized.

Mathematical programming concepts and technique

are used behind the scene to instantiate the decision

variables.

The conceptual system architecture is shown in

the diagram above. In this demonstration, we use

standard XQuery 3.0 (W3C), and ILOG/CPLEX as

our mathematical programming (MP) solver. The

DG-Query evaluator (i) compiles a DG-Query

program, using data retrieved with XQuery from

XML

Document

Mathematical

Programming

Solvers

DG-

uer

Evaluato

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

160

XML documents, into an MP model expressed in

OPL or a direct solver structure, (ii) uses an external

MP solver to find a solution, and (iii) employs

XQuery statement structure to insert the MP solution

back into the source XML as a set of attributes.

7 EXPERIMENTS

We have taken the running example included in this

paper and adjusted the syntax slightly to conform to

the language grammar of the prototype DG-Query

compiler. The program and input XML data was

successfully translated into CPLEX OPL and solved

accordingly with the following generated OPL

Model:

tuple tuple_productParts_index {

key string prodid;

key string partid;

int partqty;

};

... ...

Ordered {tuple_productParts_index}

productParts_index = ...;

float budget = 204.00;

dvar int manufItems_index_qty[manufItems_index];

dvar int suppItems_index_qty[suppItems_index];

dexpr int product_index_producedqty[p in

product_index] = sum(m in manuf_index)

manufItems_index_qty[item(manufItems_index,

<m.mid,p.prodid>)];

... ...

dexpr float total_profit = total_revenue -

total_cost;

maximize total_profit;

constraints {

forall(mp in manufItems_index)

manufItems_index_qty[mp] >= 0;

forall(sp in suppItems_index)

suppItems_index_qty[sp] >= 0;

forall(part in part_index)

part_index_reqqty[part] <= sum(s in

supp_index)

suppItems_index_qty[item(suppItems_index,

<s.sid,part.partid>)];

forall(s in suppItems_index)

suppItems_index_qty[s] <= s.supply;

total_cost <= budget;

}

The running time of mathematical programs is

often sensitive to the way the problem is

represented. For many classes of problems, a poor

choice of mathematical representation will cause the

solver to take significantly longer to find an optimal

solution. One concern with DG-Query is that

reduction process in modelling may add substantial

overhead by introducing redundant or unnecessary

variables and constraints. It is possible that a

simpler or more concise modelling may lead to a

significantly faster solution.

Our hypothesis is that problems described using

DG-Query are solved as efficiently as when

described using any other modelling tool. To test

this, we took the manufacturing network problem

from Section 2 and manually created a concise

formulation of the OPL model. We then generated

several instances of varying problem sizes and

compared the execution time when solved using this

OPL model and through DG-Query reduction.

Although we did some preliminary experiments, the

complexity of models can scale up. More advanced

experiment designs will be conducted as part of

future work.

8 CONCLUSION AND FUTURE

WORK

This paper introduced the DG-Query language for

decision optimization. We have defined its formal

syntax and semantics. We also used an example to

show that the overhead of using a high-level

language is small when compared with manually

crafted mathematical programming formulation.

With the added benefit of having similar syntax to

XQuery, we believe that DG-Query provides a

practical solution for decision optimization. DG-

Query can be used in many real industry

applications such as finding the best course of action

in an emergency, making a patient treatment

decision for the best prognosis, deciding on public

policies guided by the most positive outcomes, and

deciding on business transactions within a supply

chain, in the enterprise world.

Apart from providing native support for XML

and XQuery, the DG-Query language should also

promote modularity. The Process and Analytics

Language (PAL) provides a great framework for

building composite models from either atomic or

other composite models (Shao G., Kibira D.,

Brodsky A., and Egge, N.E., 2011). The DG-Query

language will use the PAL formalism for providing

the necessary abstractions to support modularity.

ACKNOWLEDGEMENTS

This work is partially support by NIST funding.

REFERENCES

IBM, 2013. Mathematical Programming vs. Constraint

Programming. Data Sources Retrieved from

http://www-01.ibm.com/software/integration/

optimization/cplex-cp-optimizer/mp-cp/

DG-Query:AnXQuery-basedDecisionGuidanceQueryLanguage

161

IBM, 2011. OPL, the modeling language: Data Sources

Retrieved from

http://pic.dhe.ibm.com/infocenter/cosinfoc/v12r4/inde

x.jsp?topic=%2Filog.odms.ide.help%2FOPL_Studio%

2Fopllangref%2Ftopics%2Fuss_langref_datasources_

1.html.

AMPL, 2013. http://www.ampl.com/

Boisvert, R. F., Howe, S. E., and Kahaner, D. K. Kahaner,

1985. Gams: a framework for the management of

scientific software. ACM Trans. Math. Softw.,

11(4):313–355.

W3C, Jan. 2012, Extensible Markup Language (XML).

Retrieved from http://www.w3.org/XML/

Rick Jelliffe and Academia Sinica Computing Centre,

2002, The Schematron Assertion Language 1.6,

Retrieved from http://xml.ascc.net/resource/

schematron/Schematron2000.html.

W3C, Sep. 2006. Extensible Markup Language (XML) 1.1

(Second Edition). Retrieved from http://

www.w3.org/TR/2006/REC-xml11-20060816/#dt-

wellformed.

W3C, Dec. 2010. XQuery 1.0: An XML Query Language

(Second Edition). Retrieved from http://

www.w3.org/TR/xquery/

Maplesoft, 2013. Overview of the XMLTools Package.

Retrieved from http://www.maplesoft.com/support/

help/Maple/view.aspx?path=XMLTools.

Brodsky, A. and Nash H., 2006. CoJava: Optimization

modeling by nondeterministic simulation. In F.

Benhamou, editor, Proceedings of Principles and

Practice of Constraint Programming - CP, volume

4204 of Lecture Notes in Computer Science, pages

91–106. Springer.

Brodsky A., Egge N. and Wang X.S., 2011. Reusing

Relational Queries for Intuitive Decision

Optimization, System Sciences (HICSS), 44th Hawaii

International Conference on, pages 1-9.

Brodsky, A., Bhot, M. M., Chandrashekar, M., Egge, N.

E., and Wang, X. S., 2009. A Decisions Query

Language (DQL): High-Level Abstraction for

Mathematical Programming over Databases.

Proceedings of the 35th SIGMOD International

Conference on Management of Data.

Brodsky, A., Egge, N.E., and Wang X.S., 2012.

Supporting Agile Organizations with a Decision

Guidance Query Language, Journal of Management

Information Systems 28 (4), 39-68.

Shao G., Kibira D., Brodsky A., and Egge, N. E., 2011.

Decision support for sustainable manufacturing using

decision guidance query language, International

Journal of Sustainable Engineering 4 (3), 251-265.

APPENDIX

Manufacturers XML Schema Definition:

<xs:schema

xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="manufacturers">

<xs:complexType>

<xs:sequence>

<xs:element ref="products"

minOccurs="1"

maxOccurs="1" />

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name="products">

<xs:complexType>

<xs:sequence>

<xs:element ref="product"

minOccurs="1"

maxOccurs="unbounded" />

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name="product">

<xs:complexType>

<xs:attribute name="mid"

type="xs:string”

use="required"/>

<xs:attribute name="prodid"

type="xs:string"

use="required"/>

<xs:attribute name="name"

type="xs:string"

use="optional"/>

<xs:attribute name="ppu"

type="xs:decimal"

use="required"/>

<xs:attribute name="qty"

type="xs:integer"

use="optional"/>

</xs:complexType>

</xs:element>

</xs:schema>

Suppliers XML Schema Definition:

<xs:schema

xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="suppliers">

<xs:complexType>

<xs:sequence>

<xs:element ref="parts"

minOccurs="1"

maxOccurs="1" />

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name="parts">

<xs:complexType>

<xs:sequence>

<xs:element ref="part"

minOccurs="1"

maxOccurs="unbounded" />

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name="part">

<xs:complexType>

<xs:attribute name="sid"

type="xs:string"

use="required"/>

<xs:attribute name="partid"

type="xs:string"

use="required"/>

<xs:attribute name="name"

ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems

162

type="xs:string"

use="optional"/>

<xs:attribute name="cpu"

type="xs:decimal"

use="required"/>

<xs:attribute name="supply"

type="xs:integer"

use="optional"/>

<xs:attribute name="qty"

type="xs:integer"

use="optional"/>

</xs:complexType>

</xs:element>

</xs:schema>

Product Parts XML Schema Definition:

<xs:schema

xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="productParts">

<xs:complexType>

<xs:sequence>

<xs:element ref="productPart"

minOccurs="1"

maxOccurs="unbounded" />

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name="productPart">

<xs:complexType>

<xs:attribute name="prodid"

type="xs:string"

use="required"/>

<xs:attribute name="partid"

type="xs:string"

use="required"/>

<xs:attribute name="qty"

type="xs:integer"

use=" required"/>

</xs:complexType>

</xs:element>

</xs:schema>

DG-Query:AnXQuery-basedDecisionGuidanceQueryLanguage

163