Analysis of Mexican Research Production - Exploring a Scientifical Database

Silvia B. González Brambila, Mihaela Juganaru-Mathieu, Claudia N. González-Brambila

Abstract

This paper presents an exploring analysis of the research activity of a country using ISI web of Science Collection. We decided to focus the work on Mexican research in computer science. The aim of this text mining work is to extract the main direction in this scientific field. The focal exploring axe is: clustering. We have done two folds analysis: the first one on frequency representation of the extracted terms, and the second, much larger and difficult, on mining the document representations with the aim of finding clusters of documents, using the most used terms in the title. The cluster algorithms applied were hierarchical, kmeans, DIANA, SOM, SOTA, PAM, AGNES and model. Experiments with different number of terms and with the complete dataset were realized, but results were not satisfactory. We conclude that the best model for this type of analysis is model based, because it gives a better classification, but still it needs better performance algorithms. Results show that very few areas are developed by Mexicans.

References

  1. ACM (2012). Retrieved January 8, 2013, from dl.acm.org/ccs.cfm.
  2. Vaidas Balys, Rimantas Rudzkis (2010) Statistical Classification of Scientific Publications, Informatica, Volume 21, Issue 4, pp 471 - 486.
  3. Guy Brock, Vasyl Pihur, Susmita Datta and Somnath Datta (2008). clValid: An R Package for Cluster Validation, Journal of Statistical Software. March 2008, Volume 25, Issue 4. http://www.jstatsoft.org.
  4. Guy Brock, Vasyl Pihur, Susmita Datta and Somnath Datta (2011). clValid: Validation of Clustering Results. R package version 0.6-4. http://CRAN.Rproject.org/package=clValid.
  5. G. Csardi, T Nepusz (2006). The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.sf.net.
  6. Ingo Feinerer, Kurt Hornik, and David Meyer (2008). Text Mining Infrastructure in R. Journal of Statistical Software 25(5): 1-54. URL: http://www.jstatsoft.org/ v25/i05/.
  7. Ingo Feinerer and Kurt Hornik (2013). tm: Text Mining Package. R package version 0.5-8.3. http://CRAN.Rproject.org/package=tm.
  8. Ian Fellows (2012). wordcloud: Word Clouds. R package ver2.2 http://CRAN.R-project.org/package=wordcloud C. Fraley, A. E. Raftery (2007). Bayesian regularization for normal mixture estimation and model-based clustering. Journal of Classification, Vol. 24, Issue2, pp. 155-181.
  9. C. Fraley, A. E. Raftery (2002). Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association, Vol. 97, pages 611-631.
  10. Cristal-Karina Galindo Duran, Mihaela Juganaru-Mathieu, Carlos Aviles Cruz, Héctor Javier Vazquez (2010). Desarrollo de una aplicación destinada a la clasificación de información textual y su evaluación por simulación, Administración y Organizaciones 25:13, pages 119-131.
  11. Tarek Gharib, Mohammed Fouad, Mostafa Aref (2010) Fuzzy Document Clustering Approach using WordNet Lexical Categories. In Advanced Techniques in Computing Sciences and Software Engineering, Khaled Elleithy (editor), Springer, pp 181-186.
  12. Michiel Hazewinkel (2005) Dynamic Stochastic Models for Indexes and Thesauri, Identification Clouds, and Information Retrieval and Storage, In Recent Advances in Applied Probability, Ricardo Baeza-Yates et al (editors), Springer US, 2005, pp 181-204.
  13. Mikael Laakso, Bo-Christer Björk (2012). Anatomy of open access publishing: a study of longitudinal development and internal structure, BMC Medicine, 10:124, pp 1-9.
  14. Jian Ma; Wei Xu; Yong-hong Sun; Turban, E.; Shouyang Wang; Ou Liu (2012) "An Ontology-Based TextMining Method to Cluster Proposals for Research Project Selection," Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on , vol.42, no.3, pp 784-790.
  15. M. Maechler, P Rousseeuw, A. Struyf, M. Hubert, K. Hornik, (2012). cluster: Cluster Analysis Basics and Extensions. R package version 1.14.3.
  16. Microsoft Academic Search (2012). Retrieved January 22, 2013, from academic.research.microsoft.com.
  17. R. Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL htttp://www.R-project.org/.
  18. G. Schwarz (1978). Estimating the dimension of a model. The Annals of Statistics, 6:461-464, 1978.
  19. Fabrizio Sebastiani (2002) Machine learning in automated text categorization, Journal ACM Computing Surveys, Volume 34 Issue 1, pages 1 - 47.
  20. Mohsen Taheriyan (2011) Subject classification of research papers based on interrelationships analysis. In Proceedings of the 2011 Workshop on Knowledge Discovery, Modeling and Simulation (San Diego, California, USA, August 2011). KDMS 7811. ACM, New York, NY, pages 39-44.
  21. H. Wickham (2009) ggplot2: elegant graphics for data analysis. Springer New York.
  22. Ying Zhao, George Karypis, Usama Flayyad (2005), Hierarchical Clustering Algorithms for Document Datasets, Data Mining and Knowledge Discovery, Volume 10, Issue 2, March 2005, pages 141-168.
Download


Paper Citation


in Harvard Style

González Brambila S., Juganaru-Mathieu M. and González-Brambila C. (2013). Analysis of Mexican Research Production - Exploring a Scientifical Database . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013) ISBN 978-989-8565-75-4, pages 177-182. DOI: 10.5220/0004548201770182


in Bibtex Style

@conference{kdir13,
author={Silvia B. González Brambila and Mihaela Juganaru-Mathieu and Claudia N. González-Brambila},
title={Analysis of Mexican Research Production - Exploring a Scientifical Database},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)},
year={2013},
pages={177-182},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004548201770182},
isbn={978-989-8565-75-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)
TI - Analysis of Mexican Research Production - Exploring a Scientifical Database
SN - 978-989-8565-75-4
AU - González Brambila S.
AU - Juganaru-Mathieu M.
AU - González-Brambila C.
PY - 2013
SP - 177
EP - 182
DO - 10.5220/0004548201770182