Time
Q
t
1
2
4
5
4
3
2
2
1
Query Time
Direction of the sweep line
in the Merge operator
1 2 3 4 5 6 7 8
2
4
2 2
1 1 1
3
ToDo: statistics...distributive and algebraic. Also Q-trails are at the tuple-level, but the include sell-
level annotations...
2.1 Research Task I: Quality Propagation and Assessment of Query Results
In this research task, we address the challenges of assessing the quality of query results under complex
processing and transformations. For example, referring to Figure ??, assume that we have relations R,
S, and T stored in the Quality-Annotated Data repository, which means that each tuple in these relations
already has its quality trail attached to it. It is very common that a single query or workflow on these
relations may involve several of the standard query operators, e.g., selection, projection, join, grouping and
aggregations, and duplicate eliminations, to produce the desired output relation O. The key question that we
address in this task is: What is the quality of each output tuple in O? Notice that our objective is not to just
infer the qualities at the last stage of processing, but to incrementally derive them after each transformation.
Otherwise, other quality-based processing would not be feasible, e.g., applying predicates and functions
on the qualities at any processing stage (Research Task II), and enabling constraints-based processing for
quality maximization (Research Task III). We propose extending the semantics and algebra of the query
operators to seamlessly manipulate the quality trails. All operators—as well as the manipulation functions
over quality trails introduced in Section ??— will consume and produce quality trails conforming to the
data model presented in Section ??. And thus, they can be seamlessly pipelined during processing. In the
following, we highlight the proposed extensions.
•Selection Operator (
p
(R)): The extension to the selection operator is straightforward since this operator
does not modify the quality trails of its input tuples. Therefore, if tuple r =<a
1
,a
2
,....,a
n
, Q > satisfies
the defined predicates p, then r will be produced in the output along with its quality trail Q.
• Merge Operator ((Q
1
, Q
2
)): Several of the relational operators, e.g., join, grouping, and aggregation
involve merging multiple tuples together to form one output tuple, and thus the input quality trails will also
need to be merged/combined together. We introduce the logical merge operator over the quality trails
that works as follows. Assume that tuples r
1
and r
2
in Figure ?? will be merged together, e.g., in a join
or aggregation, then we use a sweep line algorithm over Q
1
and Q
2
from left to right that jumps over their
transition points.
Q
r1
, Q
r2
, Q
r3
2.2 Research Task II:
2.3 Research Task III:
2.4 Research Task IV:
2.5 Research Task V:
2.6 Research Task VI:
6
ToDo: statistics...distributive and algebraic. Also Q-trails are at the tuple-level, but the include sell-
level annotations...
2.1 Research Task I: Quality Propagation and Assessment of Query Results
In this research task, we address the challenges of assessing the quality of query results under complex
processing and transformations. For example, referring to Figure ??, assume that we have relations R,
S, and T stored in the Quality-Annotated Data repository, which means that each tuple in these relations
already has its quality trail attached to it. It is very common that a single query or workflow on these
relations may involve several of the standard query operators, e.g., selection, projection, join, grouping and
aggregations, and duplicate eliminations, to produce the desired output relation O. The key question that we
address in this task is: What is the quality of each output tuple in O? Notice that our objective is not to just
infer the qualities at the last stage of processing, but to incrementally derive them after each transformation.
Otherwise, other quality-based processing would not be feasible, e.g., applying predicates and functions
on the qualities at any processing stage (Research Task II), and enabling constraints-based processing for
quality maximization (Research Task III). We propose extending the semantics and algebra of the query
operators to seamlessly manipulate the quality trails. All operators—as well as the manipulation functions
over quality trails introduced in Section ??— will consume and produce quality trails conforming to the
data model presented in Section ??. And thus, they can be seamlessly pipelined during processing. In the
following, we highlight the proposed extensions.
•Selection Operator (
p
(R)): The extension to the selection operator is straightforward since this operator
does not modify the quality trails of its input tuples. Therefore, if tuple r =<a
1
,a
2
,....,a
n
, Q > satisfies
the defined predicates p, then r will be produced in the output along with its quality trail Q.
• Merge Operator ((Q
1
, Q
2
)): Several of the relational operators, e.g., join, grouping, and aggregation
involve merging multiple tuples together to form one output tuple, and thus the input quality trails will also
need to be merged/combined together. We introduce the logical merge operator over the quality trails
that works as follows. Assume that tuples r
1
and r
2
in Figure ?? will be merged together, e.g., in a join
or aggregation, then we use a sweep line algorithm over Q
1
and Q
2
from left to right that jumps over their
transition points.
Q
r1
, Q
r2
, Q
r3
2.2 Research Task II:
2.3 Research Task III:
2.4 Research Task IV:
2.5 Research Task V:
2.6 Research Task VI:
6
ToDo: statistics...distributive and algebraic. Also Q-trails are at the tuple-level, but the include sell-
level annotations...
2.1 Research Task I: Quality Propagation and Assessment of Query Results
In this research task, we address the challenges of assessing the quality of query results under complex
processing and transformations. For example, referring to Figure ??, assume that we have relations R,
S, and T stored in the Quality-Annotated Data repository, which means that each tuple in these relations
already has its quality trail attached to it. It is very common that a single query or workflow on these
relations may involve several of the standard query operators, e.g., selection, projection, join, grouping and
aggregations, and duplicate eliminations, to produce the desired output relation O. The key question that we
address in this task is: What is the quality of each output tuple in O? Notice that our objective is not to just
infer the qualities at the last stage of processing, but to incrementally derive them after each transformation.
Otherwise, other quality-based processing would not be feasible, e.g., applying predicates and functions
on the qualities at any processing stage (Research Task II), and enabling constraints-based processing for
quality maximization (Research Task III). We propose extending the semantics and algebra of the query
operators to seamlessly manipulate the quality trails. All operators—as well as the manipulation functions
over quality trails introduced in Section ??— will consume and produce quality trails conforming to the
data model presented in Section ??. And thus, they can be seamlessly pipelined during processing. In the
following, we highlight the proposed extensions.
•Selection Operator (
p
(R)): The extension to the selection operator is straightforward since this operator
does not modify the quality trails of its input tuples. Therefore, if tuple r =<a
1
,a
2
,....,a
n
, Q > satisfies
the defined predicates p, then r will be produced in the output along with its quality trail Q.
• Merge Operator ((Q
1
, Q
2
)): Several of the relational operators, e.g., join, grouping, and aggregation
involve merging multiple tuples together to form one output tuple, and thus the input quality trails will also
need to be merged/combined together. We introduce the logical merge operator over the quality trails
that works as follows. Assume that tuples r
1
and r
2
in Figure ?? will be merged together, e.g., in a join
or aggregation, then we use a sweep line algorithm over Q
1
and Q
2
from left to right that jumps over their
transition points.
Q
r1
, Q
r2
, Q
r3
2.2 Research Task II:
2.3 Research Task III:
2.4 Research Task IV:
2.5 Research Task V:
2.6 Research Task VI:
6
ToDo: statistics...distributive and algebraic. Also Q-trails are at the tuple-level, but the include sell-
level annotations...
2.1 Research Task I: Quality Propagation and Assessment of Query Results
In this research task, we address the challenges of assessing the quality of query results under complex
processing and transformations. For example, referring to Figure ??, assume that we have relations R,
S, and T stored in the Quality-Annotated Data repository, which means that each tuple in these relations
already has its quality trail attached to it. It is very common that a single query or workflow on these
relations may involve several of the standard query operators, e.g., selection, projection, join, grouping and
aggregations, and duplicate eliminations, to produce the desired output relation O. The key question that we
address in this task is: What is the quality of each output tuple in O? Notice that our objective is not to just
infer the qualities at the last stage of processing, but to incrementally derive them after each transformation.
Otherwise, other quality-based processing would not be feasible, e.g., applying predicates and functions
on the qualities at any processing stage (Research Task II), and enabling constraints-based processing for
quality maximization (Research Task III). We propose extending the semantics and algebra of the query
operators to seamlessly manipulate the quality trails. All operators—as well as the manipulation functions
over quality trails introduced in Section ??— will consume and produce quality trails conforming to the
data model presented in Section ??. And thus, they can be seamlessly pipelined during processing. In the
following, we highlight the proposed extensions.
•Selection Operator (
p
(R)): The extension to the selection operator is straightforward since this operator
does not modify the quality trails of its input tuples. Therefore, if tuple r =<a
1
,a
2
,....,a
n
, Q > satisfies
the defined predicates p, then r will be produced in the output along with its quality trail Q.
• Merge Operator ((Q
1
, Q
2
)): Several of the relational operators, e.g., join, grouping, and aggregation
involve merging multiple tuples together to form one output tuple, and thus the input quality trails will also
need to be merged/combined together. We introduce the logical merge operator over the quality trails
that works as follows. Assume that tuples r
1
and r
2
in Figure ?? will be merged together, e.g., in a join
or aggregation, then we use a sweep line algorithm over Q
1
and Q
2
from left to right that jumps over their
transition points.
Q
r1
, Q
r2
, Q
o
2.2 Research Task II:
2.3 Research Task III:
2.4 Research Task IV:
2.5 Research Task V:
2.6 Research Task VI:
6
3 3
5
2
Min: 1
Max: 1
Avg: 1
Min: 4
Max: 4
Avg: 4
Min: 2
Max: 2
Avg: 2
Min: 1
Max: 4
Avg: 2.3
t
1
t
2
t
3
t
4
t
5
t
6
t
7
t
8
Tuple r1
Tuple r2
Tuple r3
- Q
o
is the result from applying the
Merge operator over the three quality
trails Q
r1
, Q
r2
, Q
r3
. It is built using a
sweep line algorithm moving left-to-
right.
- Each quality transition generated
from the Merge operator has a score
equals the smallest among the
corresponding input transitions.
Moreover, it has statistics computed
by combining the input transitions’
statistics.
Figure 4: Example of the Merge Operator in QTrail-DB.
Function Name Description
QTransition[] getQualityTrail() Returns r’s quality trail as an array of quality
transitions.
Int getSize()
Returns the number of transitions in r’s quality trail
QTransition[] addTransition(QTransition q)
Augments q to the right-most side of r’s quality trail.
The function returns the new extended quality trail.
QTransition[] replaceTransition(Int pos,
QTransition q)
Replaces the quality transition at position pos with the
new transition q. The function returns the new
modified trail.
QTransition[] trim(Char direction, Int num) Trims r’s quality trail and retains only the first num
transitions starting from the L.H.S or R.H.S
(depending on direction). The function returns the
new trimmed trail.
… …
Figure 5: Manipulation Functions on r.QTrail Attribute.
occurred at Position 5 (at time t
5
). These statistics
will be combined by the Merge operator to compute
the new statistics of the Q
o
’s new transition (Line 8 in
Figure 3).
Finally, the newly created transition is added to
the output quality trail (Line 10 in Figure 3).
More operators are presented in details in the full
version of this paper (Author A, 2023).
5 CREATION & MAINTENANCE
OF QUALITY TRAILS
In this section, we present the creation and mainte-
nance mechanisms of the quality trails. Since QTrail-
DB is a generic engine, the goal is to design a set
of APIs that will act as the interface between QTrail-
DB and the external world. More specifically, QTrail-
DB allows the database developers to manipulate the
quality trails as any other attribute in the database.
The quality trails are designed as special attributes
added to the database relations, i.e., each relation
R has an automatically-added special attribute called
“QTrail”. QTrail attribute is of a newly added user-
defined type representing an array of quality transi-
tions (Refer to Definitions 3.1, and 3.2). On top of this
new type, a set of manipulation functions has been de-
veloped as presented in Figure 5. These built-in func-
tions are by no means comprehensive, but they are ba-
sic functions on top of which the database developers
may create more semantic-rich functions.
In Figure 5, we present few of the developed func-
tions including the descriptions of each (more details
available in the paper full version (Author A, 2023)).
In addition to these functions, we have also developed
a set of functions to manipulate a given quality transi-
tion, e.g., building a quality transition, and setting or
retrieving specific fields within a transition
1
.
6 EXPERIMENTS
Setup: QTrail-DB is implemented within the Post-
greSQL DBMS (Stonebraker et al., 1990). A Qual-
ity Trail is modeled as a new data type, i.e., a dy-
namic array of quality transitions. The default stor-
1
Other storage schemes are possible without affecting
the core functionalities of QTrail-DB. Only the implemen-
tation of the APIs may change.
QTrail-DB: A Query Processing Engine for Imperfect Databases with Evolving Qualities
299