MINING THE COSTA RICAN WEB
Esteban Meneses
2006
Abstract
There is much to say about the structure and composition of a local web. Identification of authorities, topics and web communities can be used to improve search engines, change a portal design or to develop marketing strategies. The Costa Rican web was chosen as a test case for web mining analysis. After the study we obtained several descriptors of the web as well as the answers to typical questions like how many pages on average a site has, which file type is preferred for building a dynamic site, what is the most referenced site, which sites are similar, and many more.
References
- Baeza-Yates R., Poblete B. and Saint-Jean, F. (2003). Evolution of the Chilean Web: 2001-2002 (In Spanish). In Proceedings of the Jornadas Chilenas de Computación. Chillán, Chile, November 2003.
- Boley D., Gini M., Gross R., Han S., Hastings K., Karypis G., Kumar V., Moore J. and Mobasher B. (1999). Partitioning-Based Clustering for Web Document Categorization. In Decision Support Systems. 1999.
- Bunke H., Last M., Schenker A. and Kandel A.(2003). A comparison of two novel algorithms for clustering web documents. In Proceedings of the 2nd International Workshop on Web Document Analysis (WDA).
- Buntine W., Perttu, S. and Tuulos V. (2004). Using Discrete PCA on Web Pages. In ECML/PKDD 2004, Proceedings of the Workshop on Statistical Approaches for Web Mining. Pisa, Italy, September, 2004.
- Castillo C. (2004). Effective Web Crawling. PhD Thesis, University of Chile, 2004.
- Chakrabarti, S. (2003). Mining the Web. Morgan Kaufmann Publishers, 2003.
- He X., Zha H., Ding C. and Simon H. (2001). Web Document Clustering Using Hyperlink Structures. Technical Report CSE-01-006, Dept. of Computer Science and Engineering, Pennsylvania State University, 2001.
- Jain A.K., Murty M.N. and Flynn P.J. (1999). Data Clustering: A Review In ACM Computing Surveys. September, 1999.
- Kleinberg J. (1999). Authoritative sources in a hyperlinked environment. In Journal of the ACM. 46(5):604632, 1999.
- Page L., Brin S., Motwani R. and Winograd T. (1998). The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.
- Rodríguez-Rojas O. (2000). Classification and Linear Models in Symbolic Data Analysis. PhD Thesis, University of Paris IX-Dauphine, 2000.
- Shen D., Chen Z., Yang Q., Zeng H., Zhang B., Lu Y. and Ma W. (2004). Web-page Classification through Summarization. In SIGIR 2004, Proceedings of the ACM Conference on Research & Development on Information Retrieval. South Yorkshire, United Kingdom, July, 2004.
Paper Citation
in Harvard Style
Meneses E. (2006). MINING THE COSTA RICAN WEB . In Proceedings of WEBIST 2006 - Second International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-972-8865-46-7, pages 414-421. DOI: 10.5220/0001249504140421
in Bibtex Style
@conference{webist06,
author={Esteban Meneses},
title={MINING THE COSTA RICAN WEB},
booktitle={Proceedings of WEBIST 2006 - Second International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2006},
pages={414-421},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001249504140421},
isbn={978-972-8865-46-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of WEBIST 2006 - Second International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - MINING THE COSTA RICAN WEB
SN - 978-972-8865-46-7
AU - Meneses E.
PY - 2006
SP - 414
EP - 421
DO - 10.5220/0001249504140421