6 RESULTS AND DISCUSSION
We tested the schema comparison tool on a number
of governmental and commercial schemas including
the Universal Core (UCore), a governmental schema
for sharing digital content, Cursor on Target (CoT) a
simple schema for recording geospatial positions,
National Information Exchange Model (NIEM),
Keyhole Markup Language (KML), Geographic
Markup Language (GML) and many others.
Applying the analysis described in the previous
section, the schema comparison tool identified
matching elements within pairs of input schema.
Considering only semantic correspondence
irrespective of hierarchical position, a large quantity
of prospective matches would be expected.
Relatively few matching elements, however,
were actually identified, even though the schemas
describe ostensibly similar data. As shown in Figure
6, the quantities represent the percent of data
elements from the schema in the left-hand column
contained in the schema on the top-row.
UCI CoT UCore KML
UCI
5 2 2
CoT 22
5 7
UCore 9 6
7
KML 13 9 7
Figure 5: The percentage of overlap between disparate
schemas was surprisingly small. While the general
assumption was schema from similar domains only
requires transformed between representations, these results
suggest that related schema represent different views of
similar data.
Examining the corresponding elements, at least
four types of matches were identified. Firstly
identical elements from identically imported schema
produced obvious matches. Secondly, generic
element names, such as id, type or name were
common among disparate schema. Thirdly, generic
high-level complex types, such as unit,
organization or address were present in
different schema. Finally, simple types including
latitude, organization or last_name
were found across multiple schemas.
The key result is the identification of high-level
and data value elements that represent ‘bridge’ or
‘nexus’ points between disparate data sources. In
other words, these elements provide the means to
link dissimilar data.
7 CONCLUSIONS
The techniques developed here attempted to
transform multiple, disparate XML sources into a
common concept representation while retaining the
underlying information. Through this process it
became clear that schema from similar domains
encoded different aspects of the same data. The
future objective should therefore be the assimilation
disparate data into a comprehensive knowledge
representation that connect these different realms of
data through their sparse ‘touch’ points. Based on
this research, our current effort is the development
of such a knowledge representation that accrues
information from many disparate sources and
provides tools for data manipulation, storage and
presentation.
REFERENCES
Lakshmanan, L. V., Sadri, F., 2003. Interoperability on
XML Data. In Proceedings of the 2
nd
International
Semantic Web Conference (ICSW ’03).
Rahn, E., Bernstein, P. A., 2001. A Survey of Approaches
to Automatic Schema Matching. Very Large Database
(VLDB., 10(4):334-350.
National Information Exchange Modelhttp://www.niem.
gov/, Accessed June 2010.
Powell, A. and Johnston, P., Guidelines for implementing
Dublin Core in XML.
http://dublincore.org/documents/dc-xml-guidelines/Access
ed June 2010.
Kotok, A, et al., ebXML: The New Global Standard for
Doing Business on the Internet, Sams 1
st
edition, 2001.
UCore|Universal Core 2.0 http://www.ucore.gov,
Accessed June 2010.
Stylus Studio, XML Schema Mapper http://www.stylus
studio.com/xsd_to_xsd.html, Accessed June 2010.
Fagin, R., et al., Clio: Schema Mapping Creation and Data
Exchange, appearing in Conceptual Modeling:
Foundations and Applications, Springer 2009.
W3C, 2008, Extensible Markup Language (XML).
http://www.w3.org/TR/REC-xml/.
WordNet http://wordnet.princeton.edu/, Accessed June
2010.
Heflin, J., Hendler, J., 2000. Semantic Interoperability on
the Web. In Proc.of Extreme Markup Languages.
Graphic Communications association, 2000, pp.111-
120.
A METHOD FOR INTEROPERABILITY BETWEEN STRUCTURED DATA SOURCES USING SEMANTIC
ANALYSIS
239