2 Basic XML Handling in PROLOG and FNQUERY
Currently, the Campe Dictionary, which was written in the early years of the 19th cen-
tury, is converted into a machine readable structure. Within this process, the only anno-
tations available so far, were the declaration of the different font sizes Joachim Heinrich
Campe uses for displaying the structure of his act, the numbering of the line and page
breaks in the dictionary, and paragraphs; thus, we found a very limited XML structure
in the source file which we used for the first basic transformaions.
In this paper, we present how to annotate documents with PROLOG and the XML
query and transformation language FNQUERY [14], which is implemented in SWI–
PROLOG. To exemplify these annotations, we use the Campe Dictionary as a low–
structured base for obtaining a well–formed XML document according to TEI.
The PROLOG Data Structure for XML. The field notation developed for FNQUERY
represents an XML element
<T
a
1
= ”v
1
”. . . a
n
= ”v
n
”
>
.. .
</T>
as a PROLOG term
T:As:C
, called FN triple, with the tag ”
T
” and an association list
As
= [a
1
:v
1
, . . . , a
n
:v
n
]
of attributes a
i
and their corresponding values v
i
(with 1 ≤ i ≤ n), which are PROLOG
terms. The content
C
can be either text or nested sub–elements represented as FN triples.
If
As
is empty, then the FN triple can be abbreviated as a pair
T:C
.
In most available dictionaries, each entry is encapsulated in its own paragraph, and
thus, it could be easily detected. In the following example, an entry is annotated with
paragraph
and is followed by an element
W_2
, which shows the lemma of the entry
in a larger font; recognizing both elements is necessary, because there could exist other
paragraph
elements, which do not represent entries.
<paragraph>
<W_2>Der Aal</W_2>, <W_1>des -- es, Mz. die -- e</W_1>, ...
</paragraph>
An XML document can be loaded into an FN triple using the predicate
dread
. For the
paragraph
element above we get FN triples with empty attribute lists:
paragraph:[ ’W_2’:[’Der Aal’], ’, ’,
’W_1’:[’des -- es, Mz. die -- e’], ’, ...’ ]
Extraction of Entries using FNQUERY. The query language FNQUERY allows for
accessing a component of an XML document by its attribute or tag name. Furthermore,
complex path or tree expressions can be formulated in a way quite similar to XPATH.
The XML element from above could now be parsed with the following predicate
campe_find_entry/2
. The path expression
Campe/descendant::paragraph
selects
a descendant element of
Campe
with the tag
paragraph
. For avoiding the recogni-
tion of a new paragraph without a following
W_2
tag, we use another path expression
Entry/nth_child::1/tag::’*’
for computing the tag of the first child of the consid-
ered entry.
campe_find_entry(Campe, Entry) :-
Entry := Campe/descendant::paragraph,
’W_2’ := Entry/nth_child::1/tag::’*’.
124