Information Extraction from Web Services - A Comparison of Tokenisation Algorithms

Alejandro Metke-Jimenez, Kerry Raymond, Ian MacColl

Abstract

Most web service discovery systems use keyword-based search algorithms and, although partially successful, sometimes fail to satisfy some users information needs. This has given rise to several semantics-based approaches that look to go beyond simple attribute matching and try to capture the semantics of services. However, the results reported in the literature vary and in many cases are worse than the results obtained by keyword-based systems.We believe the accuracy of the mechanisms used to extract tokens from the non-natural language sections of WSDL files directly affects the performance of these techniques, because some of them can be more sensitive to noise. In this paper three existing tokenization algorithms are evaluated and a new algorithm that outperforms all the algorithms found in the literature is introduced.

References

  1. Wu, C., Chang, E.: Aligning with the web: an atom-based architecture for web services discovery. Service Oriented Computing and Applications 1 (2007) 97-116 10.1007/s11761- 007-0008-x.
  2. D'Mello, D., Ananthanarayana, V.: A review of dynamic web service description and discovery techniques. In: 2010 First International Conference on Integrated Intelligent Computing, IEEE (2010) 246-251
  3. Bose, A.: Effective web service discovery using a combination of a semantic model and a data mining technique. Master's thesis, Queensland University of Technology (2008)
  4. Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18 (1975) 613-620
  5. Gabrilovich, E.: Feature generation for textual information retrieval using world knowledge. PhD thesis, Israel Institute of Technology (2006)
  6. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence. Volume 7., Morgan Kaufmann Publishers Inc. (2007) 1606-1611
  7. Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research 34 (2009) 443-498
  8. J.Hou, J.Zhang, R.Nayak, A.Bose: Semantics-based web service discovery using information retrieval techniques. In: Pre-Proceedings of the Initiative for the Evaluation of XML Retrieval 2010, IR Publications (2010) 274 - 285
  9. Wu, C., Chang, E., Aitken, A.: An empirical approach for semantic web services discovery. In: Software Engineering, 2008. ASWEC 2008. 19th Australian Conference on, IEEE (2008) 412-421
Download


Paper Citation


in Harvard Style

Metke-Jimenez A., Raymond K. and MacColl I. (2011). Information Extraction from Web Services - A Comparison of Tokenisation Algorithms . In Proceedings of the 2nd International Workshop on Software Knowledge - Volume 1: SKY, (IC3K 2011) ISBN 978-989-8425-82-9, pages 12-23. DOI: 10.5220/0003698000120023


in Bibtex Style

@conference{sky11,
author={Alejandro Metke-Jimenez and Kerry Raymond and Ian MacColl},
title={Information Extraction from Web Services - A Comparison of Tokenisation Algorithms},
booktitle={Proceedings of the 2nd International Workshop on Software Knowledge - Volume 1: SKY, (IC3K 2011)},
year={2011},
pages={12-23},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003698000120023},
isbn={978-989-8425-82-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Workshop on Software Knowledge - Volume 1: SKY, (IC3K 2011)
TI - Information Extraction from Web Services - A Comparison of Tokenisation Algorithms
SN - 978-989-8425-82-9
AU - Metke-Jimenez A.
AU - Raymond K.
AU - MacColl I.
PY - 2011
SP - 12
EP - 23
DO - 10.5220/0003698000120023