While MP is geared more toward problems with
numeric variables (over reals and/or integers), CP is
mostly used for combinatorial optimization
problems (where some variables range over finite
domains), which are typical in applications for
planning and scheduling (IBM, 2013). For both MP
and CP, optimization modelling languages such as
OPL (IBM, 2011), AMPL (AMPL, 2013), and
GAMS (Boisvert, R.F., Howe, S.E., and Kahaner,
D.K. Kahaner, 1985), are used to formulate decision
variables, constraints and the objective function.
While MP and CP are most suitable technologies
for finding high-quality solutions to optimization
problems efficiently, the development of MP/CP
models requires significant effort and mathematical
expertise (in Operations Research) that most
database application developers and data analysts do
not have. Furthermore, the resulting models are
typically not modular, extensible or reusable.
On the other hand, to support descriptive (as
opposed to prescriptive) analytics tasks, database
query languages such as XQuery can be easily used,
which operates on XML (Extensible Markup
Language) data. XML, designed by the World Wide
Web Consortium (W3C), has become a standard for
data exchange between organizations (W3C, Jan.
2012). Not only is data easy to represent in XML,
XML is a self-describing language with the
flexibility to define complex data structures.
Furthermore, it has mechanisms to express
constraints on the structures and contents of XML
documents, such as XML schemas, DTDs and
Schematron, which allows for rule-based validation
regarding the detection of patterns in an XML
document (Rick Jelliffe and Academia Sinica
Computing Centre, 2002). The XML language is not
complicated to use as a result of its simple constructs
(W3C, Sep. 2006). Considering the widespread use
of XML for data storage and exchange, it is no
longer a nice feature to have but a necessity for any
tool that is data driven. Not only should optimization
modelling software support XML as a data source,
the software should also provide an easy mechanism
for querying XML data.
XQuery is an appropriate tool for the job.
XQuery is also designed by W3C, and its language
syntax for querying XML data is very similar to
SQL for querying a relational database, making it
easier to learn if the user already has knowledge of
SQL. While XQuery is a fully-featured language, the
FLWOR (For, Let, Where, Order by, and Return)
expression provides an elegant way to query and
manipulate XML data (W3C, Dec. 2010). Moreover,
XQuery and XML are languages that are easy to
learn and use by database application developers and
data analysts. However, XML/XQuery does not
support decision optimization.
Bridging the gap between the efficiency of
optimization algorithms based on MP/CP and the
ease of use by database application developers and
data analysts using XML/XQuery is exactly the
focus of this paper. We propose DG-Query, an
XQuery-based Decision Guidance Query Language,
which allows building optimization models by
writing or reusing existing XQuery code/programs
with minor annotations for optimization, thus
making the language easy to use by database
application developers or data analysts.
Seemless integration of the decision optimization
models with XQuery programs presents a unique
challenge. The reason for this is that optimization
models declaratively express decision variables,
constraints and the optimization objective, while
XQuery programs are written as forwardly executed
computation. We would like to avoid the direct
encoding of optimization models, e.g. in XML,
because this would create an impedance mismatch.
Instead, the idea of DG-Query is to annotate
XQuery programs with non-deterministic variables
to indicate that, intuitively, some values in the
computation are unknown, and should be determined
by the system, in such a way that a designated value
(computed by the XQuery) be optimized subject to
Boolean assertions (constraints), which are also
added as program annotations.
The technical problem we need to overcome is to
automatically translate DG-Query programs, with
their non-deterministic semantics, into formal
optimization models, expressed as MP or CP
problems. The MP/CP problems are then solved to
generate a solution to the optimization problem. The
optimization solution provides values for non-
deterministic variables, which makes the XQuery
computation deterministic and allows an answer to
be produced.
DG-Query is designed to extend the prevalent
XQuery language with minimal annotation. As a
result, we believe that DG-Query could be easily
adopted by database application developers and data
analysts especially if they are already familiar with
XQuery. In summary, the contributions of this paper
are:
We introduce DG-Query, an XQuery-based
analytics language for decision optimization and
define its formal semantics
We provide a reduction method to automatically
transform DG-Query programs into formal MP
DG-Query:AnXQuery-basedDecisionGuidanceQueryLanguage
153