Matching Knowledge Users with Knowledge Creators using Text Mining Techniques
Abdulrahman Al-Haimi
2014
Abstract
Matching knowledge users with knowledge creators from multiple data sources that share very little similarity in content and data structure is a key problem. Solving this problem is expected to noticeably improve research commercialization rate. In this paper, we discuss and evaluate the effectiveness of a comprehensive methodology that automates classic text mining techniques to match knowledge users with knowledge creators. We also present a prototype application that is considered one of the first attempts to match knowledge users with knowledge creators by analyzing records from Linkedin.com and BASE-search.net. The matching procedure is performed using supervised and unsupervised models. Surprisingly, experimental results show that K-NN classifier shows a slight performance improvement compared to its competition when evaluated in a similar context. After identifying the best-suited methodology, system architecture is designed. One of the main contributions of this research is the introduction and analysis of a novel prototype application that attempts to bridge the gap between research performed in industry and academia.
References
- Antezana, E., Kuiper, M. and Mironov, V., 2009. Biological knowledge management: The emerging role of the Semantic Web technologies. Briefings in Bioinformatics, 10, pp.392-407.
- Bielefeld University, 2014. About Bielefeld Academic Search Engine (BASE).
- Bilenko, M. and Mooney, R.J., 2003. On evaluation and training-set construction for duplicate detection. In Proceedings of the KDD-2003 Workshop on Data Cleaning, Record Linkage, and Object Consolidation. pp. 7-12.
- Bozeman, B., 2000. Technology transfer and public policy: a review of research and theory. Research Policy, 29, pp.627-655.
- Campbel, S. and Swigart, S., 2014. Go Beyond Google: Gathering Internet Intelligence 5th editio., Cascade Insight.
- Carmel, D. et al., 2001. Static index pruning for information retrieval systems. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR 7801. pp. 43-50.
- Chitika Insights, 2013. Online Ad CTR: Impact of Referring Google Result Position
- Chung, W., 2004. An automatic text mining framework for knowledge discovery on the Web. University of Arizona.
- Cohen, W.W., 1998. Integration of heterogeneous databases without common domains using queries based on textual similarity. Proceedings of the 1998 ACM SIGMOD international conference on Management of data, 27, pp.201-212.
- Colas, F. and Brazdil, P., 2006. Comparison of SVM and some older classification algorithms in text classification tasks. IFIP International Federation for Information Processing, 217, pp.169-178.
- Council of Canadian Academies, 2012. The State of Science and Technology in Canada, Ottawa, Ontario.
- Deerwester, S. et al., 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, pp.391-407.
- Dooris, M.J., 1989. Organizational Adaptation and the Commercialization of Research Universities. Planning for Higher Education, 17(3), pp.21-31.
- Dorneles, C.F., Gonçalves, R. and Santos Mello, R., 2010. Approximate data instance matching: a survey. Knowledge and Information Systems, 27(1), pp.1-21.
- Elmagarmid, A.K., Ipeirotis, P.G. and Verykios, V.S., 2007. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 19, pp.1-16.
- Ertek, G., Tapucu, D. and Arin, I., 2013. Text mining with rapidminer. In M. Hofmann and R. Klinkenberg, eds. RapidMiner: Data Mining Use Cases and Business Analytics Applications. Boca Raton, FL: CRC Press, pp. 241-261.
- Etzkowitz, H., 2002. Incubation of incubators: innovation as a triple helix of university-industry-government networks Henry. Science and Public Policy, 29, pp.115-128.
- Etzkowitz, H. and Peters, L.S., 1991. Profiting from knowledge: Organisational innovations and the evolution of academic norms. Minerva, 29(2), pp.133- 166.
- Fellegi, I.P. and Sunter, A.B., 1969. A Theory for Record Linkage. Journal of the American Statistical Association, 64, pp.1183-1210.
- Kannan, A. et al., 2011. Matching unstructured product offers to structured product specifications. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 7811. New York, New York, USA: ACM Press, pp. 404-412.
- Karlsson, M., 2004. Commercialization of Research Results in the United States: An Overview of Federal and Academic Technology Transfer, Washington, DC.
- Köpcke, H., Thor, A. and Rahm, E., 2010. Evaluation of entity resolution approaches on real-world match problems. Proceedings of the VLDB Endowment, 3, pp.484-493.
- Li, F. and Yang, Y., 2003. A Loss Function Analysis for Classification Categorization Methods in Text. In Proceedings of the Twentieth International Conferenceon Machine Learning. pp. 472-479.
- Li, M., Li, H. and Zhou, Z.-H., 2009. Semi-supervised document retrieval. Information Processing and Management, 45(3), pp.341-355.
- Liu, S.-H. et al., 2011. Development of a Patent Retrieval and Analysis Platform - A hybrid approach. Expert Systems with Applications, 38(6), pp.7864-7868.
- Maedche, A. and Staab, S., 2001. Ontology learning for the Semantic Web. IEEE Intelligent Systems, 16(2), pp.72-79.
- Mierswa, I. et al., 2006. YALE: Rapid prototyping for complex data mining tasks. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 7806. New York, New York, USA: ACM Press, pp. 935- 940.
- Mitkov, R., 2002. Anaphora Resolution 1st editio., New York, NY: Routledge.
- Newcombe, H.B. et al., 1959. Automatic Linkage of Vital Records: Computers can be used to extract “followup” statistics of families from files of routine records. Science, 130(3381), pp.954-959.
- Nidhi and Gupta, V., 2011. Recent Trends in Text Classification Techniques. International Journal of Computer Applications, 35(6), pp.45-51.
- Nordfors, D., Sandred, J. and Wessner, C., 2003. Commercialization of Academic Research Results, Stockholm, Sweden: Swedish Agency for Innovation Systems.
- Özgür, A., Özgür, L. and Güngör, T., 2005. Text Categorization with Class-Based and Corpus-Based Keyword Selection. In pInar Yolum et al., eds. Proceedings of the 20th international conference on Computer and Information Sciences. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 606-615.
- Pelleg, D. and Moore, A.W., 2000. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In Proceedings of the Seventeenth International Conference on Machine Learning. pp. 727-734.
- Porter, M.F., 1980. An algorithm for suffix stripping. Program: electronic library and information systems, 14(3), pp.130-137.
- Ramesh, P., 2014. Prediction of cost overruns using ensemble methods in data mining and text mining algorithms. Rutgers, The State University of New Jersey.
- Rogers, E.M., Takegami, S. and Yin, J., 2001. Lessons learned about technology transfer. Technovation, 21, pp.253-261.
- Rousseeuw, P.J., 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, pp.53-65.
- Siegel, D.S. et al., 2004. Toward a model of the effective transfer of scientific knowledge from academicians to practitioners: qualitative evidence from the commercialization of university technologies. Journal of Engineering and Technology Management, 21(1-2), pp.115-142.
- Sokolova, M., Japkowicz, N. and Szpakowicz, S., 2006. Beyond Accuracy , F-Score and ROC?: A Family of Discriminant Measures for Performance Evaluation. In Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence. pp. 1015-1021.
- Swamidass, P.M. and Vulasa, V., 2009. Why university inventions rarely produce income? Bottlenecks in university technology transfer. Journal of Technology Transfer, 34, pp.343-363.
- Winkler, W.E., 2002. Methods for Record Linkage and Bayesian Networks,
- Xiang, G. et al., 2012. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM 7812. p. 1980.
- Yang, Y. and Liu, X., 1999. A re-examination of text categorization methods. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval SIGIR 99, pages, pp.42-49.
- Zhou, L., Dai, L. and Zhang, D., 2007. Online shopping acceptance model - a critical survey of consumer factors in online shopping. Journal of Electronic Commerce Research, 8(1), pp.41-63.
Paper Citation
in Harvard Style
Al-Haimi A. (2014). Matching Knowledge Users with Knowledge Creators using Text Mining Techniques . In Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-035-2, pages 5-14. DOI: 10.5220/0004942000050014
in Bibtex Style
@conference{data14,
author={Abdulrahman Al-Haimi},
title={Matching Knowledge Users with Knowledge Creators using Text Mining Techniques},
booktitle={Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2014},
pages={5-14},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004942000050014},
isbn={978-989-758-035-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Matching Knowledge Users with Knowledge Creators using Text Mining Techniques
SN - 978-989-758-035-2
AU - Al-Haimi A.
PY - 2014
SP - 5
EP - 14
DO - 10.5220/0004942000050014