TERM WEIGHTING: NOVEL FUZZY LOGIC BASED METHOD VS. CLASSICAL TF-IDF METHOD FOR WEB INFORMATION EXTRACTION

Jorge Ropero, Ariel Gómez, Carlos León, Alejandro Carrasco

2009

Abstract

Solving Term Weighting problem is one of the most important tasks for Information Retrieval and Information Extraction. Tipically, the TF-IDF method have been widely used for determining the weight of a term. In this paper, we propose a novel alternative fuzzy logic based method. The main advantage for the proposed method is the obtention of better results, especially in terms of extracting not only the most suitable information but also related information. This method will be used for the design of a Web Intelligent Agent which will soon start to work for the University of Seville web page.

References

  1. Aronson, A.R, Rindflesch, T.C, Browne, A. C., 1994. Exploiting a large thesaurus for information retrieval. Proceedings of RIAO, pp. 197-216.
  2. Kosala, R., Blockeel, H., 2002. Web Mining Research: A Survey. SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, ACM, Vol. 2 (2000).
  3. Kwok, K. L., 1989. A neural network for probabilistic information retrieval. Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval. Cambridge, Massachusetts, United States.
  4. Lee, D.L., Chuang, H., Seamons, K., 1997. Document ranking and the vector-space model. IEEE Software, Vol. 14, Issue 2, p. 67 - 75.
  5. Lertnattee, V., Theeramunkong, T. 2003. Combining homogenous classifiers for centroid-based text classification. Proceedings of the 7th International Symposium on Computers and Communications, pp. 1034-1039.
  6. Liu, S., Dong, M., Zhang, H., Li, R. Shi, Z., 2001. An approach of multi-hierarchy text classification Proceedings of the International Conferences on Infotech and Info-net, 2001. Beijing. Vol 3, pp. 95 - 100.
  7. Lu, M., Hu, K., Wu, Y., Lu, Y., Zhou, L., 2002. SECTCS: towards improving VSM and Naive Bayesian classifier. IEEE International Conference on Systems, Man and Cybernetics, Vol. 5, p. 5.
  8. Ropero J., Gomez, A., Leon, C., Carrasco, A. 2007. Information Extraction in a Set of Knowledge Using a Fuzzy Logic Based Intelligent Agent. Lecture Notes in Computer Science. Vol. 4707, pp. 811-820.
  9. Ruiz, M.E., Srinivasan, P., 1998. Automatic Text Categorization Using Neural Networks. Advances in Classification Research vol. 8: Proceedings of the 8th ASIS SIG/CR Classification Research Workshop. Ed. Efthimis Efthimiadis. Information Today, Medford:New Jersey. 1998. pp 59-72.
  10. Salton, G., Buckley, C., 1996. Term Weighting Approaches in Automatic Text Retrieval. Information Processing and Management, Vol.32 (4), pp. 431- 443.
  11. Xu, J., Wang, Z. 2003. TCBLSA: A new method of text clustering. International Conference on Machine Learning and Cybernetics. Vol. 1, pp. 63-66.
  12. Zhao, Y., Karypis, G., 2002. Improving precategorized collection retrieval by using supervised term weighting schemes. Proceedings of the International Conference on Information Technology: Coding and Computing, 2002. pp 16 - 21.
Download


Paper Citation


in Harvard Style

Ropero J., Gómez A., León C. and Carrasco A. (2009). TERM WEIGHTING: NOVEL FUZZY LOGIC BASED METHOD VS. CLASSICAL TF-IDF METHOD FOR WEB INFORMATION EXTRACTION . In Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8111-85-2, pages 130-137. DOI: 10.5220/0001982901300137


in Bibtex Style

@conference{iceis09,
author={Jorge Ropero and Ariel Gómez and Carlos León and Alejandro Carrasco},
title={TERM WEIGHTING: NOVEL FUZZY LOGIC BASED METHOD VS. CLASSICAL TF-IDF METHOD FOR WEB INFORMATION EXTRACTION},
booktitle={Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2009},
pages={130-137},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001982901300137},
isbn={978-989-8111-85-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - TERM WEIGHTING: NOVEL FUZZY LOGIC BASED METHOD VS. CLASSICAL TF-IDF METHOD FOR WEB INFORMATION EXTRACTION
SN - 978-989-8111-85-2
AU - Ropero J.
AU - Gómez A.
AU - León C.
AU - Carrasco A.
PY - 2009
SP - 130
EP - 137
DO - 10.5220/0001982901300137