are to an extent useful for knowledge validation tasks,
and that the majority of the constraints were well bal-
anced with respect to complexity. However, a consid-
erable number of participants were nevertheless un-
sure about the usefulness of the method. Our analysis
suggests that the lack of familiarity with the domain
and quality rules might be the cause, although more
in-depth study is needed.
Our algorithm contains a few noteworthy limita-
tions. A practical limitation concern scalability as our
algorithm needs to evaluate a great deal of combina-
tions. This problem is slightly reduced by our prun-
ing and other optimization methods, and can also be
alleviated by parallelizing the task, but will neverthe-
less remain a challenge to deal with as the dataset in-
creases in size and, most particularly, the number of
relations. Another possible limitation lies with our as-
sumption that the majority of the knowledge is valid
and accurate. An insufficiently large enough ratio
between valid/accurate and invalid/inaccurate knowl-
edge can result in a relatively high number of false
positives and negatives, reducing the usefulness of our
method. A final noteworthy limitation is the high sen-
sitivity of the provided support and confidence values,
which, depending on the characteristics of the dataset,
can result in too few or in an unmanageable amount
of constraints. However, this is a common problem in
this field of research.
We identified several potential extensions to our
method which we offer as suggestions for future
work. Firstly, our algorithm currently only generates
a proper subset of those expressible by constraint lan-
guages such as ShEx and SHACL, missing support for
e.g. cardinality restrictions. Adding support for these
constraints would make our method more useful for
real-world knowledge validation tasks. Another angle
worth pursuing but which fell out of our current scope
is the analysis of our algorithm’s time complexity, the
theoretical speed up which can be obtained through
parallelization, and how it deals with the satisfaction
and entailment problems.
ACKNOWLEDGEMENTS
We express our gratitude to Jaap Bakker, coordinating
specialist advisor on asset management and data inte-
gration at Rijkswaterstaat, for providing us access to
the data infrastructure, experts, and facilities needed
to complete our research. This research was made
possible with the help of Rijkswaterstaat, The Nether-
lands.
REFERENCES
Akhtar, W., Cort
´
es-Calabuig,
´
A., and Paredaens, J. (2010).
Constraints in rdf. In International Workshop on Se-
mantics in Data and Knowledge Bases, pages 23–39.
Springer.
Anbutamilazhagan, T. and Selvaraj, M. K. (2014). A novel
model for mining association rules from semantic web
data. Elysium Journal, 1(2).
Barati, M., Bai, Q., and Liu, Q. (2016). SWARM: An Ap-
proach for Mining Semantic Association Rules from
Semantic Web Data, pages 30–43. Springer Interna-
tional Publishing, Cham.
Bohannon, P., Fan, W., Geerts, F., Jia, X., and Kementsi-
etsidis, A. (2007). Conditional functional dependen-
cies for data cleaning. In 2007 IEEE 23rd interna-
tional conference on data engineering, pages 746–
755. IEEE.
Calvanese, D., Fischl, W., Pichler, R., Sallinger, E., and
Simkus, M. (2014). Capturing relational schemas and
functional dependencies in rdfs. In Twenty-Eighth
AAAI Conference on Artificial Intelligence.
Cort
´
es-Calabuig, A. and Paredaens, J. (2012). Semantics of
constraints in rdfs. In AMW, pages 75–90. Citeseer.
Fan, W., Hu, C., Liu, X., and Lu, P. (2018). Discover-
ing graph functional dependencies. In Proceedings of
the 2018 International Conference on Management of
Data, pages 427–439. ACM.
Fan, W. and Lu, P. (2017). Dependencies for graphs. In Pro-
ceedings of the 36th ACM SIGMOD-SIGACT-SIGAI
Symposium on Principles of Database Systems, pages
403–416. ACM.
Fan, W., Wu, Y., and Xu, J. (2016). Functional dependen-
cies for graphs. In Proceedings of the 2016 Inter-
national Conference on Management of Data, pages
1843–1857. ACM.
F
¨
urber, C. (2015). Data quality management with semantic
technologies. Springer.
F
¨
urber, C. and Hepp, M. (2011). Towards a vocabulary for
data quality management in semantic web architec-
tures. In Proceedings of the 1st International Work-
shop on Linked Web Data Management, pages 1–8.
Gal
´
arraga, L. A., Teflioudi, C., Hose, K., and Suchanek,
F. (2013). Amie: association rule mining under in-
complete evidence in ontological knowledge bases. In
Proceedings of the 22nd international conference on
World Wide Web, pages 413–422.
Hamad, F., Liu, I., and Zhang, X. X. (2018). Food
discovery with uber eats: Building a query
understanding engine. https://eng.uber.com/
uber-eats-query-understanding/. Accessed: 2020-05-
20.
He, B., Zou, L., and Zhao, D. (2014). Using conditional
functional dependency to discover abnormal data in
rdf graphs. In Proceedings of Semantic Web Informa-
tion Management on Semantic Web Information Man-
agement, pages 1–7. ACM.
Hellings, J., Gyssens, M., Paredaens, J., and Wu, Y. (2016).
Implication and axiomatization of functional and con-
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs
91