KEYWORDS EXTRACTION - Selecting Keywords in Natural Language Texts with Markov Chains and Neural Networks

Błażej Zyglarski, Piotr Bała

Abstract

In this paper we show our approach to keywords extraction by natural language processing. We present revised and extended version of previously shown document analysis method, based on Khonen Neural Networks with Reinforcement, which uses data from the large document repository to check and improve results. We describe new improvements, which we’ve achieved with preprocessing set of words and creating initial ranking using Markov Chains. Our method shows, that keywords can be selected from the text with great accuracy. In this paper we present evaluation and comparison of both methods and example results of keywords selection upon random documents.

References

  1. Avrachenkov, K. and Litvak, N. (2004). Decomposition of the Google PageRank and Optimal Linking Strategy. Research Report RR-5101, INRIA.
  2. Avrachenkov, K. and Litvak, N. (2004). Decomposition of the google pagerank and optimal linking strategy.
  3. Bremaud, P. (2001). Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer-Verlag New York Inc., corrected edition.
  4. Doeblin, W. (1933). Exposé de la théorie des chains simples constantes de markov a un nombre fini d'états. Rev Math Union Interbalkanique, 2:77-105.
  5. Grinstead, C. M. and Snell, J. L. (1997). Introduction to Probability. American Mathematical Society, 2 revised edition.
  6. Haggstrom, O. (2002). Finite markov chains and algorithmic applications.
  7. Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21(1-3):1-6.
  8. Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab. Previous number = SIDL-WP-1999-0120.
  9. Shadbolt, N., Berners-Lee, T., and Hall, W. (2006). The semantic web revisited. IEEE Intelligent Systems, 21(3):96-101.
  10. Zyglarski, B. and Bala, P. (2009). Scientific documents management system. applikaction of kohonen neural networks with reinforcement in keywords extraction. In In Proceedings of IC3K 2009, pages 55-62. INSTICC.
Download


Paper Citation


in Harvard Style

Zyglarski B. and Bała P. (2010). KEYWORDS EXTRACTION - Selecting Keywords in Natural Language Texts with Markov Chains and Neural Networks . In Proceedings of the International Conference on Knowledge Management and Information Sharing - Volume 1: KMIS, (IC3K 2010) ISBN 978-989-8425-30-0, pages 315-321. DOI: 10.5220/0003088003150321


in Bibtex Style

@conference{kmis10,
author={Błażej Zyglarski and Piotr Bała},
title={KEYWORDS EXTRACTION - Selecting Keywords in Natural Language Texts with Markov Chains and Neural Networks},
booktitle={Proceedings of the International Conference on Knowledge Management and Information Sharing - Volume 1: KMIS, (IC3K 2010)},
year={2010},
pages={315-321},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003088003150321},
isbn={978-989-8425-30-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Management and Information Sharing - Volume 1: KMIS, (IC3K 2010)
TI - KEYWORDS EXTRACTION - Selecting Keywords in Natural Language Texts with Markov Chains and Neural Networks
SN - 978-989-8425-30-0
AU - Zyglarski B.
AU - Bała P.
PY - 2010
SP - 315
EP - 321
DO - 10.5220/0003088003150321