6 CONCLUDING REMARKS
Search engines provide support for automatic
information retrieval which helps in finding data
sources. However, the tasks of extracting the
relevant information remain for the user. Thus, there
are some bottlenecks that must be passed, such as
(Fensel, 2001): lack of a means for representation
and translation and lack of a means for content
descriptions.
Considering P2P systems, there is an extra issue:
to increase the result quality while optimizing the
search space. In this scenario, two problems must be
addressed: how to find relevant files for the user
query and how to increase the semantics in the
information resources. To overcome these issues, we
proposed an ontology-based approach that can be
used for improving searching techniques. With this
proposal, we have reduced the extra traffic produced
by traditional flooding techniques when the optimal
results are required, and increased the semantics
regarding the information storage and searching. The
search space optimization is achieved by clustering
files into super peers, based on file similarity. The
increasing of the semantics is done by adopting
ontologies, making explicit the information content
in a manner independent of the underlying structures
used to store the information.
We have presented the ontology manager, by
defining and implementing a tool for matching
ontologies to XML documents. By matching the
ontology to a XML file, the system can connect the
peer to a proper super peer that is described by a
specific ontology. The matching phase basically
considers the concept name, the structure similarity
and stemmer algorithms. The ontologies are
generated from an integration process among the
conceptual schemas that describe the XML files.
We implemented a tool named The Matcher that
identifies the similarity between XML files and
OWL ontologies. To evaluate the results, we have
performed a set of experimental tests, which clearly
demonstrated the accurate results. As future work,
we are going to incorporate this tool into DetVX, a
framework for detecting, managing and querying
XML replicas and versions in P2P scenarios. We are
currently developing a graphic tool for peer
management based on JXTA platform (Gong, 2001).
The system will allow managing the super peers,
peers and corresponding files, as well to assess the
performance when using the presented approach.
ACKNOWLEDGEMENTS
This work has been partially supported by CNPq
under grant No. 142396/2004-4 for Deise de Brum
Saccol; Pronex Project – FAPERGS under grant No.
0408933 and CNPq Universal under grant No.
481055/2007-0 for Renata de Matos Galante.
REFERENCES
Baeza-Yates, R.A. and Ribeiro-Neto, B.A., 1999. Modern
Information Retrieval. ACM Press / Addison-Wesley.
Bertino, E.; Guerrini, G. and Mesiti, M., 2004. A
Matching Algorithm for Measuring the Structural
Similarity between an XML Document and a DTD and
its Applications. Information Systems, Elsevier
Science Ltd., 29, 23-46.
Dalamagas, T.; Cheng, T.; Winkel, K.J. and Sellis,
T.,2004. Clustering XML Documents using Structural
Summaries. In: EDBT Work. on Clustering
Information over the Web, Greece.
Fensel, D., 2001. Ontologies: A Silver Bullet for
Knowledge Management and Electronic Commerce.
Springer.
Francesca, F.D.; Gordano, G.; Ortale, R. and Tagarelli,
A.., 2003. Distance-based Clustering of XML
Documents. In: Work. on Mining Graphs, Trees and
Sequences, Croatia.
Gong, L., 2001. JXTA: A Network Programming
Environment. IEEE Internet Computing, 5(3):88–95,
May/June.
Kantrowitz, M., Mohit, B. and Mittal, V., 2000.
Stemming and its effects on TFIDF ranking. In: SIGIR
Conf. on Research and Development in Information
Retrieval. Athens.
Levenshtein, V., 1966. Binary Codes capable of correcting
deletions, insertions, and reversals. Cybernetics and
Control Theory, 10(8):707–710.
Lian, W.; Cheung, D.; Mamoulis, N. and Yiu, S., 2004.
An Efficient and Scalable Algorithm for Clustering
XML Documents by Structure. IEEE Trans. on
Knowledge and Data Engineering , 16, 82-96.
Madhavan, J., Bernstein, P. A. and Rahm, E., 2001.
“Generic schema matching using Cupid”. In:
VLDB’01, Rome, Italy.
Maedche, A.; Staab, S., 2002. “Measuring similarity
between ontologies”. In: EKAW.
Manning, C. D. and Schütze, H., 1999. Foundations of
Statistical Natural Language Processing. 1
st
ed.
Cambridge, MA: MIT Press.
Mena, E., and Illarramendi, A., 2001. Ontology-based
query processing for global information systems.
Springer.
Nejdl, W., Wolf, B., Qu, C., Decker, S., Sintek, and et. al.,
2002. EDUTELLA: A P2P Networking Infrastructure
Based on RDF. In: WWW’02, Hawaii, EUA.
AN ONTOLOGY-BASED APPROACH FOR SEMANTIC INTEROPERABILITY IN P2P SYSTEMS
315