TOWARDS AN IE AND IR SYSTEM DEALING WITH SPATIAL INFORMATION IN DIGITAL LIBRARIES – EVALUATION CASE STUDY

Christian Sallaberry, Mustapha Baziz, Julien Lesbegueries, Mauro Gaio

2007

Abstract

This paper deals with spatial Information Extraction (IE) and Retrieval (IR) in Digital Libraries environments. The proposed approach (implemented within PIV1 prototype) is based on a linguistic and semantic analysis of digital corpora and free text queries. First, we present requirements and a methodology of semantic annotation for automatic indexing and geo-referencing of text documents. Then we report on a case study where the spatial-based IR process is evaluated and compared to classical (statistical-based) IR approaches using first pure spatial queries and then more general ones dealing with both spatial and thematic scopes. The main result in these first experiments shows that combining a spatial approach with a classical (statistical-based) IR one improves in a significant way retrieval accuracy, namely in the case of general queries.

References

  1. Abolhassani, M., Fuhr, N., Govert; N., 2003. Information Extraction and Automatic Markup for XML documents, Intelligent Search on {XML} Data, Springer, vol. 2818, pp. 159-174.
  2. Baeza-Yates, R. A., Ribeiro-Neto., B. A., 1999. Modern Information Retrieval. ACM Press / Addison-Wesley.
  3. Borillo, A., 1998. L'espace et son expression en français. L'essentiel. Ophrys.
  4. Boughanem, M., Chrisment, C., Tmar, M., 2001. Mercure and MercureFiltre Applied for Web and Filtering Tasks at TREC-10. Proceeding of TREC.
  5. Charnois, T., Mathet, Y., Enjalbert, P., Bilhaut, F., 2004. Geographic reference analysis for geographic document querying. Workshop on the Analysis of Geographic References, Human Language Technology Conference, NAACL-HLT.
  6. Chen, Y-Y., Suel, T., Markowetz, A., 2006. Efficient query processing in geographic web search engines, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 277 - 288.
  7. Clementini, E., Sharma, J., and Egenhofer, M., 1994. Modeling topological spatial relations: Strategies for query processing. Computers and Graphics.pp. 815- 822.
  8. Cohn, A. G., and Hazarika, S. M., 2001. Qualitative spatial representation and reasoning: An overview. Fundamenta Informaticae, 46(1-2):1-29.
  9. Da Silva, J., Times, V.C., Salgado, A.C., 2006. An Open Source and Web Based Framework for Geographic and Multidimensional Processing. Advances in Spatial and Image based Information Systems track, ACM SAC.
  10. Egenhofer, M. J., Franzosa, R.D., 1991. Point-Set Topological Relations. International Journal for Geographic Information Sytems, 5(2):161-174.
  11. Egenhofer, M. J., 2002. Toward the semantic geospatial web. In GIS 7802: Proceedings of the 10th ACM international symposium on Advances in geographic information systems, pp. 1-4. ACM Press.
  12. Freeman, J., 1975. The Modelling of Spatial Relations. Computer Graphics and Image Processing, 4:156-171.
  13. Gaizauskas, R., Wilks, Y., 1998. Information extraction: Beyond document retrieval. Journal of Documentation, 54(1): 70-105.
  14. Gaizauskas, R., 2002. An information extraction perspective on text mining: Tasks, technologies and prototype applications. Euromap TextMining Seminar.
  15. Hill, L., 1999. Indirect geospatial referencing through place names in the digital library: Alexandria digital library experience with developing and implementing gazetteers. 62nd Annual Meeting of the American Society for Information Science, pp. 57-69. Medford, N.J.: ASIS.
  16. Hill, L., 2000. Core elements of digital gazetteers: Place names, categories, and footprints. In ECDL 7800: Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries, pp. 280-290. Springer-Verlag.
  17. Jones, C.-B., Abdelmoty, A.-I., Finch, D., Fu, G., Vaid, S., 2004. The Spirit Spatial Search Engine: Architecture, Ontologies and Spatial Indexing. Third International Conference - Geographic Information Science, Adelphi, Usa, pp. 125 - 139.
  18. Lesbegueries, J., Gaio, M., Loustau, P., and Sallaberry, C., 2006. Geographical information access for nonstructured data. ACM SAC - Advances in Spatial and Image based Information Systems track.
  19. Lesbegueries, J., Sallaberry, C., and Gaio, M., 2006b. Associating spatial patterns to text-units for summarizing geographic information. Workshop GIR - SIGIR.
  20. Malandain, N., Gaio, M., Madelaine, J., 2001. Improving retrieval effectiveness by automatically creating some multiscaled links between text and pictures. In Proceedings of SPIE, Document Recognition and Retrieval VIII, volume 4307, pages 89-99.
  21. Martins, B., M. Silva, M-J., and Andrade, L., 2005. Indexing and ranking in Geo-IR systems. In Proc. of the 2nd Int. Workshop on Geo-IR (GIR).
  22. Muller, P., 2002. Topological spatio-temporal reasoning and representation. Computational Intelligence, pp. 420-450.
  23. Porter, M., 2001. Snowball: A language for stemming algorithms.
  24. Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gatford, M., Payne, A., 1995. Okapi at TREC-4.
  25. Sallaberry, C., Gaio, M., Lesbegueries, J., and Loustau, P., 2007. A Semantic Approach for Geospatial Information Extraction from Unstructured Documents. In The Geospatial Web, Springer. ISBN 1-84628-826- 6. http://www.geospatialweb.com/
  26. Sanderson, M. and Kohler, J., 2004. Analyzing geographic queries. In Proceedings of the Workshop on Geographic Information Retrieval, SIGIR, www.geo.unizh.ch/rsp/gir/
  27. Torres, M., 2002. Semantics definition to represent spatial data. International Workshop -Semantic Processing of Spatial Data -Geopro.
  28. Vaid, S., Jones, C. B., Joho, H., and Sanderson, M., 2005. Spatio-textual indexing for geographical search on the web. In Proc. of the 9th Int. Symp. on Spatial and Temporal Databases (SSTD).
  29. Vandeloise, C., 1986. L'espace en français. Travaux Linguistiques. Seuil.
  30. Wildöcher, A., Faurot, E., Bilhaut, F., 2004. Multimodal indexation of contrastive structures in geographical documents. In RIAO, pp.555-570.
  31. Widlocher, A., Bilhaut, F., 2005. La plate-forme linguastream : un outil d'exploration linguistique sur corpus. In Actes de la 12e Conférence Traitement Automatique du Langage Naturel.
  32. Woodruff, A.G., Plaunt, C., 1994. GIPSY: Automated Geographic Indexing of Text Documents. Journal of the American Society for Information Science, 45:9:645-655
  33. Zipf., 1949. Human Behaviour and the Principle of Least Effort. Addison Wesley.
Download


Paper Citation


in Harvard Style

Sallaberry C., Baziz M., Lesbegueries J. and Gaio M. (2007). TOWARDS AN IE AND IR SYSTEM DEALING WITH SPATIAL INFORMATION IN DIGITAL LIBRARIES – EVALUATION CASE STUDY . In Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 5: ICEIS, ISBN 978-972-8865-92-4, pages 190-197. DOI: 10.5220/0002383701900197


in Bibtex Style

@conference{iceis07,
author={Christian Sallaberry and Mustapha Baziz and Julien Lesbegueries and Mauro Gaio},
title={TOWARDS AN IE AND IR SYSTEM DEALING WITH SPATIAL INFORMATION IN DIGITAL LIBRARIES – EVALUATION CASE STUDY},
booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 5: ICEIS,},
year={2007},
pages={190-197},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002383701900197},
isbn={978-972-8865-92-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 5: ICEIS,
TI - TOWARDS AN IE AND IR SYSTEM DEALING WITH SPATIAL INFORMATION IN DIGITAL LIBRARIES – EVALUATION CASE STUDY
SN - 978-972-8865-92-4
AU - Sallaberry C.
AU - Baziz M.
AU - Lesbegueries J.
AU - Gaio M.
PY - 2007
SP - 190
EP - 197
DO - 10.5220/0002383701900197