SIMPLIFIED QUERY CONSTRUCTION

(Queries Made as Easy as Possible)

Brad Arshinoff, Damon Ratcliffe, Martin Saetre and Reda Alhajj

∗

Department of Computer Science University of Calgary Calgary, Alberta, Canada

Tansel

Ozyer

TOBB Ekonomi ve Teknoloji

Universitesi, Ankara, Turkey

Keywords:

Visual Query Language, SQL, Data Extraction, Relational Query Language, Form Based Query Language,

HCI, Data extraction.

Abstract:

QMAEP- Queries Made as Easy as Possible, is intended to be a system that greatly simplify the process of

query construction for statisticians and researchers. This document is focused on the usability of the database

query language and deals with visual representations of the query process, in speciﬁc the select query. Methods

of integrating simple Graphical User Interfaces (GUIs) for building queries into pre-existing database forms

is explored to provide users an intuitive method for query construction. This paper explores data mining as it

pertains to clinical research with emphasis on simplifying the data extraction process from complex databases

so as to accommodate analysis using important statistical software such as SASS, QMath and MS Excel.

1 INTRODUCTION

Database systems are common and expandable tools

used to accommodate the storage and manipulation of

data. On the other hand, graphical Interfaces design

techniques and the ﬁeld of Human Computer Inter-

actions (HCI) have evolved so much over the last 20

years that it is no longer uncommon for users with

little general computing skills to be able to master

the art of database navigation (Preece et al., 2002).

Unfortunately, most GUIs in today’s databases are

static views of data that cannot accommodate all re-

search. To accommodate new research, new views

of data must be constructed, which bring data to-

gether in previously undeﬁned ways. Designing and

constructing new GUIs, however, is a time consum-

ing and expensive way to reorganize how data is dis-

played. Adding new Graphical view of data into a

database requires trained database developers, which

is not a feasible way to accommodate all researchers.

For this reason, the most commonly created views

of data are not graphical form based views, but tab-

ular table like views which are easier to construct.

Furthermore, most researchers do not want to merely

∗

Reda Alhajj is also afﬁliated with Department of Com-

puter Science, Global University, Beirut, Lebanon

reorganize their view of data from a database, but

want to perform statistical software analysis on it as

well (Ezekiel, 2004). The process of composing such

a query from a large and complex research database

structure is a tedious and often arduous task (Ezekiel,

2004). Users composing queries may be unaware of

SQL and database design speciﬁc conventions such

as relationships, ﬁeld naming, primary keys, con-

straints and types. Also, ﬁltering criteria entry is com-

mon (Celko, 2004) and causes errors (Show et al.,

1993). Most query utilities offer little support in

the way of solving logic errors in ﬁltering expres-

sions (Siau, 1998). The goal of QMAEP is to set

these researchers free from design and development

speciﬁcs and allow them to concentrate on their re-

search. This paper explores the development of the

QMAEP system and the algorithms and research that

have made it possible.

2 RELATED WORK

Little research has been put forward to user-database

interaction research (Owei et al., 2002). This is

mainly due to the low priority assigned to such re-

search as most research put towards the designing

125

Arshinoff B., Ratcliffe D., Saetre M., Alhajj R. and Özyer T. (2007).

SIMPLIFIED QUERY CONSTRUCTION - (Queries Made as Easy as Possible).

In Proceedings of the Ninth International Conference on Enterprise Information Systems - DISI, pages 125-130

DOI: 10.5220/0002398601250130

 SciTePress

of new database architectures. A paper by Keng

Siau (Siau, 1998) deals with Object Relationship

(OR) modeling to explore the relationships within

databaseVORQL (visual object relational query lan-

guage). It adopts the natural view that the real world

consists of objects and relationships. Shortcomings of

this paper include the problem with overﬂow of infor-

mation and screen clutter problems confusing users.

It also requires users to understand QBE (query by

example) table structure, which is no longer used

in practice; with the QMAEP system it is hoped a

new table structure which is simpler and easier to

use than QBE can be offered. Similarly it also re-

quires users to have at least some understanding of the

low level query’s possible. Again the intention of the

QMAEP system is to improve on this idea by creating

a GUI to deal with Queries at the most generic User

level- working through interfaces already present in

the database, rather than confusing spider web of ob-

jects already connected (as seen in Siau (Siau, 1998)

and apparent in the MS ACCESS visual database

model). The QMAEP Select Query Tool was devel-

oped in the GUI type relating to database forms. The

implementation of forms comes from the apparent ad-

vantages to using them as seen in Doan’s paper (Doan

et al., 1995). GUI or ’form’ based interaction offers

many advantages such as the ability to provide default

values easily for all elements and easy error check-

ing on the values entered textually. Also reduction

of cognitive load on users by seeing only applicable

values, which is beneﬁcial for consistency and appro-

priate data (Ezekiel, 2004).

3 OVERVIEW OF THE QMAEP

APPROACH

The QMAEP system has been successfully integrated

into an existing clinical research database for Stroke

studies belonging to researchers at a medical univer-

sity in Turkey. This database is based on collecting

strict, detailed information on volunteer stroke pa-

tients in an effort to enhance stroke research. In fact,

in an effort to avoid costly omissions in data collec-

tion that could result in an inability to corroborate

future research, an immensely large volume of data

is collected on every patient. Much of the data is

collected in a quasi-duplicate fashion, i.e., from the

perspective of more than one physician or at multi-

ple time points. Querying such data was previously

particularly troublesome. Because there are multi-

ple records relating to a single patient in such tables,

when queried the result is a cross-product of related

records meaning duplicated rows of the same patient

with varying values for some ﬁelds and repeated val-

ues for others. Such a format, beyond being difﬁcult

to read for human beings, is difﬁcult to analyze with

statistical software.

Though the potential research implications of

gathering a greater quantity of data then is imme-

diately required are constructive and substantial, the

immediate effect of gathering the excess informa-

tion has actually been, according to some involved

researchers, a hindrance on short term accomplish-

ment. Narrowing down the pertinent information for

a particular research topic has become something of a

chore and physicians and statisticians alike were be-

coming impatient with the clumsy MS Access query

editor they were using to construct queries to gather

their data.

Figure 1: QMAEP position in a database framework.

3.1 Qmaep’s Position in Database

Framework

QMAEP can be viewed as a new layer for processing

data retrieval which is referred to as “Formed Based

Query Interface”. It lies in a level just above the

query optimizer in the database framework depicted

in Figure 1. The QMAEP system itself is composed

of several components designed to allow translation

of a users actions on a form-based query composer

into syntactically and logically correct SQL. It allows

mapping from a displayed value in a form to a value

data ﬁeld in a database table, and ﬁnally to the cre-

ation of a new ﬁeld in a query result. This requires

a large degree of dynamic processing of a database

schema. Under the QMAEP system of data retrieval,

the users are completely abstracted from any low level

representation. In terms of data mining, this tool is a

rather simple method of retrieving the appropriate in-

formation.

ICEIS 2007 - International Conference on Enterprise Information Systems

126

3.2 Query Context Fields and Query

Context Tables

In QMAEP terminology, the group of ﬁelds to which

all other ﬁelds associate is called the query context-

ﬁeld/s. The source table from which the query-

context ﬁeld/s are taken is called the query context-

table. All ﬁelds in a query that are not context-ﬁelds

are known as attribute ﬁelds as they can be seen as in

someway describing the context ﬁelds. A table in the

database schema which contains only attribute ﬁelds

is known as a attribute-table.

For simplicity in query creation, all queries cre-

ated with QMAEP must contain a context ﬁeld. When

a new query is created, the ﬁrst ﬁeld added to the

query becomes the new context ﬁeld. Since by the

deﬁnition given, all other ﬁelds in the query are re-

lating to and in some manner describing the context-

ﬁeld, a workable means of joining tables in the

database is attained. All tables are joined to the con-

text table in some manner. Because all context ﬁeld

values should be displayed in the result independently

of whether there is a corresponding record in one or

more of the attribute tables, the join mechanism used

is INNER JOIN operation off the context table.

The following section will approach the task of

joining tables containing attribute ﬁelds in detail and

discuss the issues that must be overcome.

4 LINKING TABLES

An application database has a set of tables and a set of

relations connecting the tables in a manner deﬁned by

the database schema. Any two tables from the schema

that have a set of links that connect them, either di-

rectly or indirectly through other tables, are said to be

related. Tables that do not have any set of direct or in-

direct links connecting them are said to be unrelated.

Because a set of unrelated tables have no meaning-

ful correspondence to one another, this section will

concentrate on how to bring data ﬁelds from two re-

lated tables. This approach can then be generalized to

bringing together n tables where one of the tables is

deﬁned to be the query-context table.

(a) (b)

Figure 2: Table A and Z have a)Direct b)Indirect Relation-

ships.

Figure 3: Dynamically creating list boxes to represent non-

linking key ﬁelds.

Two tables, table A and table Z, have a direct re-

lationship between them if there is exists some set of

relations R in the database schema that directly con-

nects attributes from table A to attributes of table Z

without going through any intermediate table X (see

Figure 2).

Tables which are directly related can be joined by

reading the corresponding set of relations from the

database schema and adding an appropriate INNER

JOIN statement to the SQL query representation. For

tables that are not directly joined, a join path must

be constructed by recursively traversing all interme-

diate tables X in the database schema. Tables that are

connected indirectly can be joined by joining a single

attribute table to sub-queries containing the context

table and one or more other attribute tables. Thus, the

problem of linking indirectly joined tables retracts to

the problem of connecting those which are directly

joined. Finally, recursive maze tree traversal algo-

rithms are adopted for linking tables with respect to

relationship diagram of the database.

5 SQL CONSTRUCTION

The process of creating new SQL statements within

this system is limited to certain aspects of the SQL

tree structure. As QMAEP is a system designed

speciﬁcally for SELECT Queries for data retrieval,

the basic structure of the SELECT query will con-

tain only the query context ﬁeld from the context ta-

ble, and the remaining components will be ﬁlled with

attribute ﬁelds. Qualiﬁcations can also be obtained

from the users to be used for each ﬁeld in the query,

which would be entered in the Criteria box (orange)

at the bottom of the form shown in Figure 3.

As discussed in section 3.2, the query context ﬁeld

is the ﬁrst ﬁeld added to any new query. This ﬁeld or-

ganizes the display and retrieval of all information in

the query. As the user adds to a query it still only has

1 query context ﬁeld but the number of attribute ﬁelds

grows. To demonstrate the use of the QMAEP Query

System, an example of retrieving information from a

SIMPLIFIED QUERY CONSTRUCTION - Queries Made as Easy as Possible

127

hospital database regarding Patients will be detailed

here.

Figure 4: Partial view of an existing database form in a hos-

pital database.

Let us begin with the user selects the patient#

ﬁeld from the demographics form and then loads the

QMAEP editor. The form is linked in the background

to one or more database tables. The linking infor-

mation can be programmatically read off the form,

and thus the QMAEP system can dynamically inter-

act with any new or modiﬁed forms and/or a modiﬁed

database schema. In Figure 4 a new query deﬁnition

is about to be constructed and the user has chosen a

PatientID ﬁeld as the context ﬁeld. The demograph-

ics table, which is the table that contains the PatientID

ﬁeld in the patient database schema, will thus become

the context table.

Figure 5: MS Access Object Relational View of tables. The

table circled is the Query Context Table.

Because the system object model contains the in-

formation as to what table the chosen ﬁeld belongs to,

QMAEP now learns the Query Context Table and it

adds to the domain category from Figure 6. The user

having decided what will be the Query Context Table

can then press a hot key to move on to the next ﬁeld

and add it to the query. The hot key will bring up

the QMAEP query construction window. Inside the

query construction window are details such as what

the Query Context Table is and what the Domain val-

ues are for the chosen ﬁeld. There is also an area in

which users are able to deﬁne the Qualiﬁcations on

the data:

Figure 6: QMAEP Query GUI.

With the QMAEP query form ﬁlled out as in Fig-

ure 6, when the user clicks ’add ﬁeld’ button, QMAEP

now crates the following Simple SQL statement con-

taining only the context ﬁeld. It is obvious that

QMAEP is selecting the patient numbers because it

is the Query Context Field.

Figure 7: SQL statement created.

Figure 8: SQL statement created.

To add additional information to the resulting

query, the process is just as straightforward. User just

locates another ﬁeld in the database forms. Once the

ﬁeld is located, the hot key is toggled once again. If

one were to add the ’patients name’ ﬁeld to the ta-

ble, the resulting SQL statement would be as shown

in Figure 8. Note that any ﬁlter criteria would then

be entered into the WHERE clause of the SQL state-

ment. To do this again and add other ﬁelds such as

Patient FHH#, the resulting SQL statement shown in

Figure 9 is created.

Because QMAEP completely abstracts the user

from the physical aspects of the database schema,

adding attribute ﬁelds to a query from a table other

than the context-table is just as simple, as far as the

user is concerned.

There are two main cases in joining the select

statements from two different tables. Lets assume that

we wish to combine our Query Context Table with Ta-

ble Y from the database.

ICEIS 2007 - International Conference on Enterprise Information Systems

128

Figure 9: SQL statement created.

As we have seen, the Query Context Field is de-

cided before any attribute ﬁeld is added to the query

deﬁnition. The context ﬁeld acts as a pseudo-key to

the query deﬁnition (strictly speaking, a query deﬁni-

tion is not required to have a key). However, because

of the fact we are treating the context-ﬁeld as a key

to the query result there is no need to worry about

the cardinality on the Query Context Table Side of

any relation/s linking it to the new attribute table. In

terms of the actual process that is done by QMAEP in

adding attribute ﬁelds, we can generalize to the two

basic cases discussed next in Sections 5.1 and 5.2.

The cases are organized by the cardinality on the at-

tribute table side of the joining relationships.

Figure 10: Schema diagram of Case A type relationships.

Figure 11: Example of Case A type relationship using rela-

tional Algebra tree diagrams.

5.1 Case A: (1 to 1), or (N to 1)

Relationship

Case A situation (see Figure 10 and Figure 11) would

occur when it is required to join an attribute table with

a cardinality of 1 with respect to the context-table.

Note that if the relationship between the context and

attribute tables is an indirect relationship, a cardinal-

ity of 1 must be the maximum (hence only) cardinality

on any joining relation in the join path.

One or more JOIN SQL operations will be deﬁned

in the SQL query result once the path information

from one of the traversal algorithms is obtained. Al-

ternatively, a union of the tables is sometimes possible

for this type of relationship. Case A relationships are

the simplest type as only a single matching attribute

value from the attribute table side of the relation is

possible for each value in the context ﬁeld. As men-

tioned in Section 4, if this is the only type of relation-

ship we are interested in when constructed queries,

a directed links approach is all that is necessary for

joining the tables.

Figure 12: Schema Diagram of Case B.

5.2 Case B: (1 to M) or (N to M)

Relationship

When working with this type of relationship (see Fig-

ure 12), QMAEP dynamically creates a list box for

any other key ﬁelds (non-linking key ﬁelds) from the

attribute table, as described in section 4. When the

user choose a value from the list box, a subquery is

created to extract a portion of the table such that the

data extracted has a one-to-one correspondence with

the context table. The subquery is named appropri-

ately to represent both the table it represents as well

its key value it was extracted with. The naming of the

subquery will eventually be used to name the ﬁelds

added from it, so as the user can easily identify them

in the query result.

Figure 13: User chooses a ﬁeld from a table with plural

cardinality with respect to the context ﬁeld.

Continuing with the example from Figure 8, sup-

posed the user next chooses a ﬁeld from the NIH

Stroke scale (NIHSS), which has a plural cardinality

with respect to the demographics because it is com-

pleted a multiple time points for each patient. When

the QMAEP query window loads, it will show a list

box such as the one shown in Figure 13 to allow the

user to choose a period (which is the non-linking key

ﬁeld in tblNIH Stroke Scale) to query the ﬁeld at.

Figure 14: QMAEP’s dynamically created list box for the

non-linking key component of tblNIH Stroke Scale.

SIMPLIFIED QUERY CONSTRUCTION - Queries Made as Easy as Possible

129

When the user then clicks the ’Add ﬁeld’ button a

subquery deﬁnition for table [tblNIH Stroke Scale] is

created as demonstrated in Figure 14.

Figure 15: Subquery deﬁnition for [tblNIH Stroke Scale].

Figure 16: SQL query for combined Queries.

This subquery is saved as a new query deﬁnition

which is called [x tbINIH Stroke Scale Baseline].

Joining this subquery deﬁnition now reduces to the

problem stated in Case A (Section 5.1). The user’s

query deﬁnition now will be modiﬁed to link in the

subquery and pull appropriate ﬁelds (see Figure 15).

As discussed in Section 4, the join type for attribute

tables is always an INNER JOIN (or equivalently a

LEFT JOIN off the context table).

Figure 17: A possible qualiﬁcation for a NIHSS value.

Lets assume also that the user had entered in a

qualiﬁcation for the value of baseline NIHSS, such

that only patients with a NIHSS of less than 4 are to

appear in the query result (see Figure 15). The query

result from Figure 15 then would appear as the result

shown in Figure 17. The ﬁnal SQL query is shown in

Figure 18.

6 CONCLUSION AND FUTURE

WORK

This paper has examined a new user-friendly query

system intended to assist in the process of creating

queries and is intended to beneﬁt researchers and

other common database users. A case study into a

particular group of medical researchers revealed that

in many cases a simple to use system may be better

than one which is-fully functional in terms of all ex-

isting SQL operations.

Figure 18: SQL query for combined Queries.

The notions of a query context table and query

context ﬁeld were examined. A tree traversal algo-

rithm on database relations was viewed as a way to

dynamically generate necessary table linking infor-

mation by reading a database schema. A method of

dynamic list box creation was examined as one possi-

ble such solution.

We took a look at translation from user actions on

a database system into SQL construction to generate

executable queries. Relational algebra was used to

illustrate an example and a step by step study of the

SQL command construction was presented.

The QMAEP system is currently only integrated into

one commercial use database, The system should

therefore ﬁrst be tested on other commercial database

systems with automated integration. A new system

should be provided such that data which is not in-

tended for general queries can be treated differently,

or blocked from the user. Aggregate functions, cross

tab queries, update, append delete and possibly even

table deﬁnition queries should be explored.

REFERENCES

Celko, J. (2004). Joe Celko’s SQL for Smarties: Trees and

Hierarchies. Morgan Kaufmann Publishers Inc., San

Francisco, CA, USA.

Doan, D. K., Paton, N. W., Kilgour, A. C., and al Qaimari,

G. (1995). Multi-paradigm query interface to an

object-oriented database. Interacting with Computers,

7(1):25–47.

Ezekiel, J. (2004). Ethical and Regulatory Aspects of Clini-

cal Research: Readings and Commentary. Johns Hop-

kins University Press.

Owei, V., Navathe, S. B., and Rhee, H.-S. (2002). An ab-

breviated concept-based query language and its ex-

ploratory evaluation. J. Syst. Softw., 63(1):45–67.

Preece, J., Rogers, Y., and Sharp, H. (2002). Interaction

Design. Wiley.

Show, G. Y., Kong, H., and Lin, W. K. (1993). Intelligent

user interface to sql-based database system. In En-

gineering Applications of Artiﬁcial Intelligence, vol-

ume 6, pages 307–316.

Siau, K. (1998). A visual object-relationship query lan-

guage for user-database interaction. Telemat. Inf.,

15(1-2):103–119.

ICEIS 2007 - International Conference on Enterprise Information Systems

130