Matching Knowledge Users with Knowledge Creators using Text Mining Techniques

Abdulrahman Al-Haimi

Abstract

Matching knowledge users with knowledge creators from multiple data sources that share very little similarity in content and data structure is a key problem. Solving this problem is expected to noticeably improve research commercialization rate. In this paper, we discuss and evaluate the effectiveness of a comprehensive methodology that automates classic text mining techniques to match knowledge users with knowledge creators. We also present a prototype application that is considered one of the first attempts to match knowledge users with knowledge creators by analyzing records from Linkedin.com and BASE-search.net. The matching procedure is performed using supervised and unsupervised models. Surprisingly, experimental results show that K-NN classifier shows a slight performance improvement compared to its competition when evaluated in a similar context. After identifying the best-suited methodology, system architecture is designed. One of the main contributions of this research is the introduction and analysis of a novel prototype application that attempts to bridge the gap between research performed in industry and academia.

References

  1. Antezana, E., Kuiper, M. and Mironov, V., 2009. Biological knowledge management: The emerging role of the Semantic Web technologies. Briefings in Bioinformatics, 10, pp.392-407.
  2. Bielefeld University, 2014. About Bielefeld Academic Search Engine (BASE).
  3. Bilenko, M. and Mooney, R.J., 2003. On evaluation and training-set construction for duplicate detection. In Proceedings of the KDD-2003 Workshop on Data Cleaning, Record Linkage, and Object Consolidation. pp. 7-12.
  4. Bozeman, B., 2000. Technology transfer and public policy: a review of research and theory. Research Policy, 29, pp.627-655.
  5. Campbel, S. and Swigart, S., 2014. Go Beyond Google: Gathering Internet Intelligence 5th editio., Cascade Insight.
  6. Carmel, D. et al., 2001. Static index pruning for information retrieval systems. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR 7801. pp. 43-50.
  7. Chitika Insights, 2013. Online Ad CTR: Impact of Referring Google Result Position
  8. Chung, W., 2004. An automatic text mining framework for knowledge discovery on the Web. University of Arizona.
  9. Cohen, W.W., 1998. Integration of heterogeneous databases without common domains using queries based on textual similarity. Proceedings of the 1998 ACM SIGMOD international conference on Management of data, 27, pp.201-212.
  10. Colas, F. and Brazdil, P., 2006. Comparison of SVM and some older classification algorithms in text classification tasks. IFIP International Federation for Information Processing, 217, pp.169-178.
  11. Council of Canadian Academies, 2012. The State of Science and Technology in Canada, Ottawa, Ontario.
  12. Deerwester, S. et al., 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, pp.391-407.
  13. Dooris, M.J., 1989. Organizational Adaptation and the Commercialization of Research Universities. Planning for Higher Education, 17(3), pp.21-31.
  14. Dorneles, C.F., Gonçalves, R. and Santos Mello, R., 2010. Approximate data instance matching: a survey. Knowledge and Information Systems, 27(1), pp.1-21.
  15. Elmagarmid, A.K., Ipeirotis, P.G. and Verykios, V.S., 2007. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 19, pp.1-16.
  16. Ertek, G., Tapucu, D. and Arin, I., 2013. Text mining with rapidminer. In M. Hofmann and R. Klinkenberg, eds. RapidMiner: Data Mining Use Cases and Business Analytics Applications. Boca Raton, FL: CRC Press, pp. 241-261.
  17. Etzkowitz, H., 2002. Incubation of incubators: innovation as a triple helix of university-industry-government networks Henry. Science and Public Policy, 29, pp.115-128.
  18. Etzkowitz, H. and Peters, L.S., 1991. Profiting from knowledge: Organisational innovations and the evolution of academic norms. Minerva, 29(2), pp.133- 166.
  19. Fellegi, I.P. and Sunter, A.B., 1969. A Theory for Record Linkage. Journal of the American Statistical Association, 64, pp.1183-1210.
  20. Kannan, A. et al., 2011. Matching unstructured product offers to structured product specifications. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 7811. New York, New York, USA: ACM Press, pp. 404-412.
  21. Karlsson, M., 2004. Commercialization of Research Results in the United States: An Overview of Federal and Academic Technology Transfer, Washington, DC.
  22. Köpcke, H., Thor, A. and Rahm, E., 2010. Evaluation of entity resolution approaches on real-world match problems. Proceedings of the VLDB Endowment, 3, pp.484-493.
  23. Li, F. and Yang, Y., 2003. A Loss Function Analysis for Classification Categorization Methods in Text. In Proceedings of the Twentieth International Conferenceon Machine Learning. pp. 472-479.
  24. Li, M., Li, H. and Zhou, Z.-H., 2009. Semi-supervised document retrieval. Information Processing and Management, 45(3), pp.341-355.
  25. Liu, S.-H. et al., 2011. Development of a Patent Retrieval and Analysis Platform - A hybrid approach. Expert Systems with Applications, 38(6), pp.7864-7868.
  26. Maedche, A. and Staab, S., 2001. Ontology learning for the Semantic Web. IEEE Intelligent Systems, 16(2), pp.72-79.
  27. Mierswa, I. et al., 2006. YALE: Rapid prototyping for complex data mining tasks. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 7806. New York, New York, USA: ACM Press, pp. 935- 940.
  28. Mitkov, R., 2002. Anaphora Resolution 1st editio., New York, NY: Routledge.
  29. Newcombe, H.B. et al., 1959. Automatic Linkage of Vital Records: Computers can be used to extract “followup” statistics of families from files of routine records. Science, 130(3381), pp.954-959.
  30. Nidhi and Gupta, V., 2011. Recent Trends in Text Classification Techniques. International Journal of Computer Applications, 35(6), pp.45-51.
  31. Nordfors, D., Sandred, J. and Wessner, C., 2003. Commercialization of Academic Research Results, Stockholm, Sweden: Swedish Agency for Innovation Systems.
  32. Özgür, A., Özgür, L. and Güngör, T., 2005. Text Categorization with Class-Based and Corpus-Based Keyword Selection. In pInar Yolum et al., eds. Proceedings of the 20th international conference on Computer and Information Sciences. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 606-615.
  33. Pelleg, D. and Moore, A.W., 2000. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In Proceedings of the Seventeenth International Conference on Machine Learning. pp. 727-734.
  34. Porter, M.F., 1980. An algorithm for suffix stripping. Program: electronic library and information systems, 14(3), pp.130-137.
  35. Ramesh, P., 2014. Prediction of cost overruns using ensemble methods in data mining and text mining algorithms. Rutgers, The State University of New Jersey.
  36. Rogers, E.M., Takegami, S. and Yin, J., 2001. Lessons learned about technology transfer. Technovation, 21, pp.253-261.
  37. Rousseeuw, P.J., 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, pp.53-65.
  38. Siegel, D.S. et al., 2004. Toward a model of the effective transfer of scientific knowledge from academicians to practitioners: qualitative evidence from the commercialization of university technologies. Journal of Engineering and Technology Management, 21(1-2), pp.115-142.
  39. Sokolova, M., Japkowicz, N. and Szpakowicz, S., 2006. Beyond Accuracy , F-Score and ROC?: A Family of Discriminant Measures for Performance Evaluation. In Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence. pp. 1015-1021.
  40. Swamidass, P.M. and Vulasa, V., 2009. Why university inventions rarely produce income? Bottlenecks in university technology transfer. Journal of Technology Transfer, 34, pp.343-363.
  41. Winkler, W.E., 2002. Methods for Record Linkage and Bayesian Networks,
  42. Xiang, G. et al., 2012. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM 7812. p. 1980.
  43. Yang, Y. and Liu, X., 1999. A re-examination of text categorization methods. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval SIGIR 99, pages, pp.42-49.
  44. Zhou, L., Dai, L. and Zhang, D., 2007. Online shopping acceptance model - a critical survey of consumer factors in online shopping. Journal of Electronic Commerce Research, 8(1), pp.41-63.
Download


Paper Citation


in Harvard Style

Al-Haimi A. (2014). Matching Knowledge Users with Knowledge Creators using Text Mining Techniques . In Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-035-2, pages 5-14. DOI: 10.5220/0004942000050014


in Bibtex Style

@conference{data14,
author={Abdulrahman Al-Haimi},
title={Matching Knowledge Users with Knowledge Creators using Text Mining Techniques},
booktitle={Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2014},
pages={5-14},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004942000050014},
isbn={978-989-758-035-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Matching Knowledge Users with Knowledge Creators using Text Mining Techniques
SN - 978-989-758-035-2
AU - Al-Haimi A.
PY - 2014
SP - 5
EP - 14
DO - 10.5220/0004942000050014