ognize only one error at a time.
In this paper, we present RDF-Doctor, a compre-
hensive approach for error detection and correction in
RDF documents. The motivation of this work was
mainly encouraged by the tremendous RDF data gen-
eration and usage in both Turtle and N-Triples serial-
ization formats, respectively. RDF-Doctor is capable
of detecting an exhaustive number of syntactic errors
and automatically correct a subset of them. Although
those two formats are used as study cases in this re-
search, the approach can be easily extended for sup-
porting other serialization formats by specifying the
respective grammar.
RDF-Doctor is fully operational and is currently
integrated within the VoCol platform
1
. VoCol (Halilaj
et al., 2016b) leverages the fundamental principles
of Git as a version control system to support on-
tology development in distributed scenarios. RDF-
Doctor can be used a standalone tool as well, and the
source code is openly available at https://github.com/
ahemaid/RDF-Doctor.
The main contributions of this work are: 1) defi-
nition of a set of grammar rules with the objective of
covering an exhaustive list of syntactic errors; 2) en-
abling the continuation of the parsing procedure after
errors occurrence; 3) identifying multiple errors in the
same line or subsequent statements; 4) automatic cor-
rection of a subset of errors; and 5) improving conflict
resolution via user-friendly messages.
This paper is organized into the following sec-
tions: Section 2 presents related work summarizing
relevant approaches for syntax checking. Section 3
provides a detailed description of our approach. Sec-
tion 4 describes a scenario for error detection and cor-
rection by RDF-Doctor. The approach is evaluated in
various scenarios in Section 5. Section 6 concludes
our work and provide an outlook for potential exten-
sions.
2 RELATED WORK
In this section, we discuss the related work to our
problem, i.e., research that has been realized in the
field of RDF syntax parsing and checking. During our
literature review, we focused on the following three
aspects: 1) parsing tools for different RDF serializa-
tions; 2) types of error messages generated after error
encountering; and 3) the error recovery.
1
https://github.com/vocol/vocol
Table 1: Comparison between RDF-Doctor and other RDF
syntax checking tools.
Feature Jena ShEx.js VRP IDLab RDF
(McBride, (Tolle, (Prud’hommeaux Validator Doctor
2002) 2000) et al, 2014) IDLab, 2019
Multiple error
7 7 X 7 X
detection per scan
Error correction 7 7 7 7 X
User-friendly
X 7 X X X
error messages
Grammar based
X X X X X
approach
2.1 Parsing Tools
Several tools for validating RDF documents use
the Another RDF Parser (ARP) parser of Jena
(McBride, 2002) such as W3C RDF validation
tool (Prud’hommeaux, ), Jena RDF toolkit.
These tools can commonly detect only the first er-
ror while consecutively parsing input from the start
point to the end. Therefore, ontology engineers are
struggling whilst debugging their RDF documents,
and need alternative tools that could be more help-
ful. To the best of our knowledge, only the Validating
RDF Parser (VRP) (Tolle, 2000) proposed by K. Tolle
can detect multiple errors at the same time. How-
ever, this work is limited only to RDF/XML serializa-
tion. Other tools such as ShEx.js (Prud’hommeaux
et al., 2014), Jena API (McBride, 2002), RDF Val-
idator (Myb, ), N3Parser (Verborgh, ), IDLab Valida-
tor (IDL, ), and TurtleEditor (Petersen et al., 2016)
are fault-intolerant, therefore not able to detect multi-
ple errors simultaneously.
2.2 Types of Error Messages
Releasing user-friendly and meaningful error mes-
sages is of a great benefit to help the user to eas-
ily identify and correct the errors. Practically, pars-
ing tools under the Shape Expressions approach, like
ShEx.js (Prud’hommeaux et al., 2014), show less ex-
pressive and unfriendly error messages. On other
hand, tools which utilize an ARP-parser-dependable
approach like Jena API (McBride, 2002), RDF Val-
idator (Myb, ) or an N3-parser-dependable approach,
like N3Parser (Verborgh, ), IDLab Turtle Valida-
tor (IDL, ) and TurtleEditor (Petersen et al., 2016)
present more expressive and user-friendly error mes-
sages including its location.
2.3 Error Recovery Approaches
Automatic error recovery is a crucial feature in on-
tology development process as well as for RDF data
reuse (Halilaj et al., 2016a). Our survey of research
RDF Doctor: A Holistic Approach for Syntax Error Detection and Correction of RDF Data
509