A HYBRID APPROACH TOWARDS INFORMATION EXPANSION BASED ON SHALLOW AND DEEP METADATA

Tudor Groza, Siegfried Handschuh

Abstract

The exponential growth of the World Wide Web in the last decade, brought an explosion in the information space, with important consequences also in the area of scientific research. Lately, finding relevant work in a particular field and exploring links between relevant publications, became a cumbersome task. In this paper we propose a hybrid approach to automatic extraction of semantic metadata from scientific publications that can help to alleviate, at least partially, the above mentioned problem. We integrated the extraction mechanisms in an application targeted to early stage researchers. The application harmoniously combines the metadata extraction with information expansion and visualization for the seamless exploration of the space surrounding scientific publications.

References

  1. Bernardi, A., Decker, S., van Elst, L., Grimnes, G., Groza, T., Handschuh, S., Jazayeri, M., Mesnage, C., Möller, K., Reif, G., and Sintek, M. (2008). The Social Semantic Desktop: A New Paradigm Towards Deploying the Semantic Web on the Desktop. IGI Global.
  2. Faisal, S., Cairns, P. A., and Blandford, A. (2007). Building for Users not for Experts: Designing a Visualization of the Literature Domain. In Information Visualisation 2007, pages 707-712. IEEE Computer Society.
  3. Groza, T., Handschuh, S., Möller, K., and Decker, S. (2007). SALT - Semantically Annotated LATEX for Scientific Publications. In ESWC 2007, Innsbruck, Austria.
  4. Han, H., Giles, C. L., Manavoglu, E., Zha, H., Zhang, Z., and Fox, E. A. (2003). Automatic document metadata extraction using support vector machines. In Proc. of the 3rd ACM/IEEE-CS Joint Conf. on Digital libraries, pages 37-48.
  5. Lafferty, J., McCallum, A., and Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. of the 18th Int. Conf. on Machine Learning, pages 282-289.
  6. Mann, W. C. and Thompson, S. A. (1987). Rhetorical structure theory: A theory of text organization. Technical Report RS-87-190, Information Science Institute.
  7. Marcu, D. (1997). The Rhetorical Parsing, Summarization, and Generation on Natural Language Texts. PhD thesis, University of Toronto.
  8. McCallum, A., Freitag, D., and Pereira, F. (2000). Maximum entropy markov models for information extraction and segmentation. In Proc. of the 17th Int. Conf. on Machine Learning, pages 591-598.
  9. Möller, K., Heath, T., Handschuh, S., and Domingue, J. (2007). Recipes for Semantic Web Dog Food - The ESWC and ISWC Metadata Projects. In Proc. of the 6th Int. Semantic Web Conference.
  10. Murray, C., Ke, W., and Borner, K. (2006). Mapping scientific disciplines and author expertise based on personal bibliography files. In Information Visualisation 2006, pages 258-263. IEEE Computer Society.
  11. Neirynck, T. and Borner, K. (2007). Representing, analyzing, and visualizing scholarly data in support of research management. In Information Visualisation 2007, pages 124-129. IEEE Computer Society.
  12. Plaisant, C., Fekete, J.-D., and Grinstein, G. (2008). Promoting Insight-Based Evaluation of Visualizations: From Contest to Benchmark Repository. IEEE Transactions on Visualization and Computer Graphics, 14(1):120-134.
  13. Shek, E. C. and Yang, J. (2000). Knowledge-Based Metadata Extraction from PostScript Files. In Proc. of the 5th ACM Conf. on Digital Libraries, pages 77-84.
  14. Shum, S. J. B., Uren, V., Li, G., Sereno, B., and Mancini, C. (2006). Modeling naturalistic argumentation in research literatures: Representation and interaction design issues. Int. J. of Intelligent Systems, 22(1):17-47.
  15. Teufel, S. and Moens, M. (2002). Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28.
  16. Tsujii, J. (2009). Refine and pathtext, which combines text mining with pathways. Keynote at Semantic Enrichment of the Scientific Literature 2009 (SESL 2009).
  17. Yilmazel, O., Finneran, C. M., and Liddy, E. D. (2004). Metaextract: an nlp system to automatically assign metadata. In JCDL 7804: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, pages 241-242.
Download


Paper Citation


in Harvard Style

Groza T. and Handschuh S. (2009). A HYBRID APPROACH TOWARDS INFORMATION EXPANSION BASED ON SHALLOW AND DEEP METADATA . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2009) ISBN 978-989-674-012-2, pages 109-116. DOI: 10.5220/0002271001090116


in Bibtex Style

@conference{keod09,
author={Tudor Groza and Siegfried Handschuh},
title={A HYBRID APPROACH TOWARDS INFORMATION EXPANSION BASED ON SHALLOW AND DEEP METADATA},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2009)},
year={2009},
pages={109-116},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002271001090116},
isbn={978-989-674-012-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2009)
TI - A HYBRID APPROACH TOWARDS INFORMATION EXPANSION BASED ON SHALLOW AND DEEP METADATA
SN - 978-989-674-012-2
AU - Groza T.
AU - Handschuh S.
PY - 2009
SP - 109
EP - 116
DO - 10.5220/0002271001090116