ble runtimes. In one application, which allowed for
15 partitions to be processed independently, the strat-
ified version computed the founded repairs in approx-
imately 1 second, whereas the sequential version did
not terminate within a time limit of 15000 seconds.
This corresponds to a speedup of at least four orders
of magnitude, demonstrating the practical impact of
the contributions of this section.
4.2 Practical Assessment
In the worst case, parallelization and stratification will
have no impact on the construction of the repair tree,
as it is possible to construct a set of AICs with no
independent subsets. However, the worst case is not
the general case, and it is reasonable to believe that
real-life sets of AICs will actually have a high paral-
lelization potential.
Indeed, integrity constraints typically reflect high-
level consistency requirements of the database, which
in turn capture the hierarchical nature of relational
databases, where more complex relations are built
from simpler ones. Thus, when specifying active in-
tegrity constraints there will naturally be a preference
to correct inconsistencies by updating the more com-
plex tables rather than the most primitive ones.
Furthermore, in a real setting we are not so much
interested in repairing a database once, but rather in
ensuring that it remains consistent as its information
changes. Therefore, it is likely that inconsistencies
that arise will be localized to a particular table. The
ability to process independent sets of AICs separately
guarantees that we will not be repeatedly evaluat-
ing those constraints that were not broken by recent
changes, focusing only on the constraints that can ac-
tually become unsatisfied as we attempt to fix the in-
consistency.
For the same reason, scalability of the techniques
we implemented is not a relevant issue: there is no
practical need to develop a tool that is able to fix hun-
dreds of inconsistencies efficiently simultaneously,
since each change to the database will likely only im-
pact a few AICs.
5 CONCLUSIONS AND FUTURE
WORK
We presented a working prototype of a tool, called
repAIrC, to check integrity of real-world SQL
databases with respect to a given set of active in-
tegrity constraints, and to compute different types
of repairs automatically in case inconsistency is de-
tected, following the ideas and algorithms in (Flesca
et al., 2004; Caroprese et al., 2007; Caroprese and
Truszczy
´
nski, 2011; Cruz-Filipe et al., 2013; Cruz-
Filipe, 2014). This tool is the first implementation of
a concept we believe to have the potential to be inte-
grated in current database management systems.
Our tool currently does not automatically apply
repairs to the database, rather presenting them to the
user. As discussed in (Eiter and Gottlob, 1992), such
a functionality is not likely to be obtainable, as human
intervention in the process of database repair is gener-
ally accepted to be necessary. That said, automating
the generation of a small and relevant set of repairs
is a first important step in ensuring a consistent data
basis in Knowledge Management.
In order to deal with real-world heterogenous
knowledge management systems, we are currently
working on extending and generalizing the notion of
(active) integrity constraints to encompass more com-
plex knowledge repositories such as ontologies, ex-
pert reasoning systems, and distributed knowledge
bases. The design of repAIrC has been with this ex-
tension in mind, and we believe that its modularity
will allow us to generalize it to work with such knowl-
edge management systems once the right theoretical
framework is developed.
On the technical side, we are planning to speed up
the system by integrating a local database cache for
peforming the many update and undo actions during
exploration of the repair trees without the overhead of
an external database connection.
ACKNOWLEDGMENTS
This work was supported by the Danish Council
for Independent Research, Natural Sciences, and by
FCT/MCTES/PIDDAC under centre grant to BioISI
(Centre Reference: UID/MULTI/04046/2013). Marta
Ludovico was sponsored by a grant “Bolsa Universi-
dade de Lisboa / Fundac¸
˜
ao Amadeu Dias”.
REFERENCES
Abiteboul, S. (1988). Updates, a new frontier. In Gyssens,
M., Paredaens, J., and van Gucht, D., editors,
ICDT’88, 2nd International Conference on Database
Theory, Bruges, Belgium, August 31 – September 2,
1988, Proceedings, volume 326 of LNCS, pages 1–18.
Springer.
Caroprese, L., Greco, S., and Molinaro, C. (2007). Priori-
tized active integrity constraints for database mainte-
nance. In Ramamohanarao, K., Krishna, P. R., Mo-
hania, M. K., and Nantajeewarawat, E., editors, Ad-
vances in Databases: Concepts, Systems and Appli-
repAIrC: A Tool for Ensuring Data Consistency - By Means of Active Integrity Constraints
25