MMWebsite, i.e. to pre-process the translation map
in order to generate an XSLT translation script
which includes the translation knowledge embedded
in its logic. Then this generated script can perform
all the document-instance translations required. The
mapping structure supports the language
equivalences for various languages, so we should
generate a translator for every possible pair of
languages. Whenever the mapping structure is
modified, a new set of translators must be generated.
Fortunately, this is an automated process (se figure
6).
The other alternative would be to merge the two
input files into a new single XML structure, and then
to process such file which would contain both the
XML document instance, and the translation
mapping information (see figure 7). This implies
joining the two XML tree structures as branches of a
higher level root.
Although this approach may prove useful for
some problems, we did not use it for the
MMWebsite, because the file merging preprocessing
must be done for each file to translate, increasing the
web service response time. Using preprocessed
translators instead proved to be a faster solution.
This limitation, which is proper of the XSLT
processing model, could be avoided by using a
standard programming language like Java instead.
3.2 How We Actually Do It
The mapping document which contains all the
necessary structural information to develop the
language converters is read by the transformations
generator, which was built as an XSLT script. XSL
can be used to process XML documents in order to
produce other XML documents or a plain text
document. As XSL stylesheets are XML, they can
be generated as an XSL output. We used this feature
to automatically generate both an English-to-local-
language XSL transformation and a local-language
to English XSL transformation for each of the
languages contained in the multilingual translation
mapping file. In this way we assured both ways
convertibility for XML documents (see figure 8).
For each target language we also generate a
DTD or a Schema translator. In our first attempts,
this took the form of a C++ and Lex parser. Later,
we changed the approach. Now we first convert the
DTD to a W3C Schema, then we translate the
Schema to the local language, and finally we can
(optionally) generate an equivalent translated DTD.
This approach has the advantage of not using
complex parsers (only XSLT) and also solves the
translation of Schemas. In our latest implementation,
the user can freely choose amongst DTD, W3C
Schema and RelaxNG, both for input and output,
allowing for a format conversion during the
translation process.
Many other markup translators can be built to
other languages in the way described here.
4 CONCLUSIONS
Amongst the observed advantages of using markup
in one’s own language are: reduced learning times,
reduction of errors and higher production. It may
also help spread the use of XML vocabularies like
DC, TEI, DocBook, and many others, into non-
English speaking countries. Cooperative
multilingual projects may benefit from the
possibility of easily translating the markup to each
encoder's language. Last, but not least, scholars of a
given language feel more comfortable tagging their
texts with mnemonics based on their own language.
Figure 6: Pre-generation of a translating XSLT script, to then translate the document instance.
WEBIST 2006 - INTERNET TECHNOLOGY
66