Prediction of Company's Trend based on Publication Statistics and Sentiment Analysis

Fumiyo Fukumoto, Yoshimi Suzuki, Akihiro Nonaka, Karman Chan

2016

Abstract

This paper presents a method for predicting company’s trend on research and development(R&D) in business area. We used three types of data collections, i.e, scientific papers, open patents, and newspaper articles to estimate temporal changes of trends on company’s business area. We used frequency counts on scientific papers and open patents to be published in time series. For news articles, we applied sentiment analysis to extract positive news reports related to the company’s business areas, and count their frequencies. For each company, we then created temporal changes based on these frequency statistics. For each business area, we clustered these temporal changes. Finally, we estimated prediction models for each cluster. The results show that the the model obtained by combining three data is effective to predict company’s future trends, especially the results show that SP clustering contributes overall performance.

References

  1. Adams, J. (2005). Early citation counts correlate with accumulated impact. Scientometrics, 63(3):567-581.
  2. Bethard, S. and Jurafsky, D. (2010). Who should i cite? learning literature search models form citation behavior. In Proc. of 19th ACM International Conference on Information and Knowledge Management, pages 609-618.
  3. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent Dirichlet Allocation. Machine Learning, 3:993-1022.
  4. Brin, S. and Page, L. (1998). The Anatomy of a Largescale Hypertextual Web Search Engine. In Computer Networks and ISDN Systems, volume 30, pages 1-7.
  5. Dai, W., Yang, Q., Xue, G., and Yu, Y. (2007). Boosting for Transfer Learning. In Proc. of the 24th International Conference on Machine Learning, pages 193-200.
  6. Davletov, F., Aydin, A. S., and Cakmak, A. (2014). High impact academic paper prediction using temporal and topological features. In Proc. of 23rd ACM International Conference on Information and Knowledge Management, pages 491-498.
  7. Hofmann, T. (1999). Probabilistic Latent Semantic Indexing. In Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 50-57.
  8. Joachims, T. (1998). SVM Light Support Vector Machine. In Dept. of Computer Science Cornell University.
  9. Joshi, M., Das, D., Gimpel, K., and Smith, N. A. (2010). Movie reviews and revenues: An experiment in text regression. In Proc of Human Language Technologies, pages 293-296.
  10. Kobayashi, N., Inui, K., Matsumoto, Y., Tateishi, K., and Fukushima, S. (2005). Collecting Evaluative Expressions for Opinion Extraction. Journal of Natural Language Processing, 12(3):203-222.
  11. Kogan, S., Levin, D., Routledge, B. R., Sagi, J. S., and Smith, N. A. (2009). Predicting risk from financial reports with regression. In Proc of Human Language Technologies, pages 272-280.
  12. Koppel, M. and Shtrimberg, I. (2004). Good news or bad news? let the market decide. In Proc. of the AAAI Spring Symposium on Exploring Attitude and Affect in Text, pages 86-88.
  13. Kudo, T. and Matsumoto, Y. (2003). Fast method for kernelbased text analysis. In Proc. of the 41st Annual Meeting of the Association for Computational Linguistics, pages 24-31.
  14. Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D., and Allan, J. (2000). Mining of concurrent text and time series. In Proc of the KDD 2000 Conference on Text Mining Workshop, pages 37-44.
  15. Li, Y., Yang, M., and Zhang, Z. (2013). Scientific Articles Recommendation. In Proc. of the ACM International Conference on Information and Knowledge Management CIKM 2013, pages 1147-1156.
  16. McGovern, A., Friedland, L., Hay, M., Gallagher, B., Fast, A., Neville, J., and Jensen, D. (2003). Exploiting relational structure to understand publication patterns in high-energy physics. 5(2):165-172.
  17. McNamara, D., Wong, P., Christen, P., and Ng, K. S. (2013). Predicting high impact academic papers using citation network features. In In Trends and Application in Knowledge Discovery and Data Mining, pages 14- 25.
  18. Milea, V., Sharef, N. M., Almeida, R. J., Kaymak, U., and Frasineer, F. (2010). Prediction of the msci euro index based on fuzzy grammer fragments extracted from european central bank statements. In Proc of the International Conference of Soft Computing and Pattern Recognition, pages 231-236.
  19. Ng, A. Y., Jordan, M. I., and Weiss, Y. (2002). On Spectral Clustering: Analysis and an Algorithm. In dietterich, T. G., Becker, S., and Ghahramani, Z., editors, Advances in Neural Information Processing systems 14. Cambridge MA MIT Press.
  20. Salton, G. and Buckley, C. (1990). Improving retrieval performance by relevance feedback. In Journal of American Society for Information Sciences, volume 41, pages 288-297.
  21. Shi, X., Leskovec, J., and McFarland, D. A. (2010). Citing for hogh impact. In Proc of the 10th Annual Joint Conference on Digital Libraries, pages 49-58.
  22. Turney, P. D. (2002). Thumbs Up or Thumbs Down? Semantic Orientation Applied to Un-supervised Classification of Reviews. In Proc. of the 40th Annual Meeting of the Association for Computational Linguistics, pages 417-424.
  23. Yagatama, D., Heilman, M., Connor, B. O., Dyer, C., Routledge, B. R., and Smith, N. A. (2011). Predicting a scientific community's response to an article. In Proc of the Conference on Empirical Methods in Natural Language Processing, pages 594-604.
  24. Yan, R., Tang, J., Liu, X., Shan, D., and Li, X. (2012). Citation count prediction: Learning to estimate future citations for literature. In Proc of the 20th ACM International Conference on Information and Knowledge Management, pages 1247-1252.
Download


Paper Citation


in Harvard Style

Fukumoto F., Suzuki Y., Nonaka A. and Chan K. (2016). Prediction of Company's Trend based on Publication Statistics and Sentiment Analysis . In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016) ISBN 978-989-758-203-5, pages 283-290. DOI: 10.5220/0006048602830290


in Bibtex Style

@conference{kdir16,
author={Fumiyo Fukumoto and Yoshimi Suzuki and Akihiro Nonaka and Karman Chan},
title={Prediction of Company's Trend based on Publication Statistics and Sentiment Analysis},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},
year={2016},
pages={283-290},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006048602830290},
isbn={978-989-758-203-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)
TI - Prediction of Company's Trend based on Publication Statistics and Sentiment Analysis
SN - 978-989-758-203-5
AU - Fukumoto F.
AU - Suzuki Y.
AU - Nonaka A.
AU - Chan K.
PY - 2016
SP - 283
EP - 290
DO - 10.5220/0006048602830290