Lexical Context for Profiling Reputation of Corporate Entities

Jean-Valère Cossu, Liana Ermakova

2017

Abstract

Opinion and trend mining on micro-blogs like Twitter recently attracted research interest in several fields including Information Retrieval (IR) and Natural Language Processing (NLP). However, the performance of existing approaches is limited by the quality of available training material. Moreover, explaining automatic systems’ suggestions for decision support is a difficult task thanks to this lack of data. One of the promising solutions of this issue is the enrichment of textual content using large micro-blog archives or external document collections, e.g. Wikipedia. Despite some advantages in Reputation Dimension Classification (RDC) task pushed by RepLab, it remains a research challenge. In this paper we introduce a supervised classification method for RDC based on a threshold intersection graph. We analyzed the impact of various micro-blogs extension methods on RDC performance. We demonstrated that simple statistical NLP methods that do not require any external resources can be easily optimized to outperform the state-of-the-art approaches in RDC task. Then, the conducted experiments proved that the micro-blog enrichment by effective expansion techniques can improve classification quality.

References

  1. Amati, G. (2003). Probability Models for Information Retrieval Based on Divergence from Randomness: PhD Thesis. University of Glasgow.
  2. Amigó, E., Carrillo-de Albornoz, J., Chugur, I., Corujo, A., Gonzalo, J., Meij, E., de Rijke, M., and Spina, D. (2014). Overview of replab 2014: author profiling and reputation dimensions for online reputation management. In Information Access Evaluation. Multilinguality, Multimodality, and Interaction, pages 307- 322.
  3. Amigó, E., De Albornoz, J. C., Chugur, I., Corujo, A., Gonzalo, J., Martín, T., Meij, E., De Rijke, M., and Spina, D. (2013). Overview of replab 2013: Evaluating online reputation monitoring systems. In CLEF 2013.
  4. Amir, S., Almeida, M., Martins, B., Filgueiras, J., and Silva, M. J. (2014). Tugas: Exploiting unlabelled data for twitter sentiment analysis. Proceedings of SemEval, pages 673-677.
  5. Anwar Hridoy, S. A., Ekram, M. T., Islam, M. S., Ahmed, F., and Rahman, R. M. (2015). Localized twitter opinion mining using sentiment analysis. Decision Analytics, 2(1):8.
  6. Bellot, P., Moriceau, V., Mothe, J., SanJuan, E., and Tannier, X. (2014). Overview of INEX tweet contextualization 2014 track. In Cappellato, L., Ferro, N., Halvey, M., and Kraaij, W., editors, Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15-18, 2014., volume 1180 of CEUR Workshop Proceedings, pages 494-500. CEUR-WS.org.
  7. Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. (2003). A neural probabilistic language model. journal of machine learning research, 3(Feb):1137-1155.
  8. Buckley, C. (1995). Automatic query expansion using SMART : TREC 3. In Proceedings of The third Text REtrieval Conference (TREC-3). NIST Special Publication 500-226, pages 69-80. National Institute of Standards and Technology.
  9. Carpineto, C. and Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys, 44(1):1-50.
  10. Cigarrán, J., Castellanos, Í., and García-Serrano, A. (2016). A step forward for topic detection in twitter: An fca-based approach. Expert Systems with Applications, 57:21-36.
  11. Cossu, J.-V., Janod, K., Ferreira, E., Gaillard, J., and ElBèze, M. (2015). Nlp-based classifiers to generalize experts assessments in e-reputation. In Experimental IR meets Multilinguality, Multimodality, and Interaction.
  12. Deveaud, R., Mothe, J., and Nia, J.-Y. (2016). Learning to rank system configurations. InProceedings of the 25th ACM International on Conference on Information and Knowledge Management, pages 2001-2004. ACM.
  13. Ermakova, L. (2015). A method for short message contextualization: Experiments at clef/inex. In International Conference of the Cross-Language Evaluation Forum for European Languages, pages 352-363. Springer.
  14. Ermakova, L., Mothe, J., and Nikitina, E. (2016). Proximity relevance model for query expansion. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, SAC 7816, pages 1054-1059, New York, NY, USA. ACM.
  15. Karisani, P., Oroumchian, F., and Rahgozar, M. (2015). Tweet expansion method for filtering task in twitter. In International Conference of the Cross-Language Evaluation Forum for European Languages, pages 55-64. Springer.
  16. Lafferty, J., McCallum, A., and Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data.
  17. Ling, W., Chu-Cheng, L., Tsvetkov, Y., and Amir, S. (2015). Not all contexts are created equal: Better word representations with variable attention.
  18. Malaga, R. A. (2001). Web-based reputation management systems: Problems and suggested solutions. 1(4):403-417.
  19. McDonald, G., Deveaud, R., McCreadie, R., Macdonald, C., and Ounis, I. (2015). Tweet enrichment for effective dimensions classification in online reputation management. In Proceedings of the Ninth International Conference on Web and Social Media, ICWSM 2015, University of Oxford, Oxford, UK, May 26-29, 2015, pages 654-657.
  20. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111- 3119.
  21. Miller, C. (2016). Why evan williams of twitter demoted himself - the new york times, http:// www.nytimes.com/2010/10/31/technology/31ev.html. Last accessed on 2017-02-18.
  22. Peetz, M.-H., de Rijke, M., and Kaptein, R. (2016). Estimating reputation polarity on microblog posts. Information Processing & Management, 52(2):193-216.
  23. Peleja, F., Santos, J., and Magalha˜es, J. (2014). Reputation analysis with a ranked sentiment-lexicon. In Proceedings of the 37th SIGIR conference.
  24. Qureshi, M. A. (2015). Utilising Wikipedia for text mining applications. PhD thesis.
  25. Rahimi, A., Sahlgren, M., Kerren, A., and Paradis, C. (2014). Stavicta group report for replab 2014 reputation dimension task. In CLEF (Working Notes), pages 1519-1527. Citeseer.
  26. Rocchio, J. (1971). Relevance feedback in information retrieval. In The SMART Retrieval System, pages 313- 323.
  27. Saleiro, P., Amir, S., Silva, M., and Soares, C. (2015). Popmine: Tracking political opinion on the web. In Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on, pages 1521- 1526. IEEE.
  28. Sánchez-Sánchez, C., Jiménez-Salazar, H., and LunaRamírez, W. A. (2013). Uamclyr at replab2013: Monitoring task. In CLEF (Working Notes). Citeseer.
  29. Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1):11-21.
  30. Spina, D., Gonzalo, J., and Amigó, E. (2014). Learning similarity functions for topic detection in online reputation monitoring. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 527-536. ACM.
  31. Spina, D., Peetz, M.-H., and de Rijke, M. (2015). Active learning for entity filtering in microblog streams. In SIGIR 2015: 38th international ACM SIGIR conference on Research and development in information retrieval.
  32. Torres-Moreno, J.-M., El-Bèze, M., Bellot, P., and Béchet, F. (2013). Opinion detection as a topic classification problem.
  33. Twitter (2016). https://about.twitter.com/company. Last accessed on 2017-02-18.
  34. Velcin, J., Kim, Y., Brun, C., Dormagen, J., SanJuan, E., Khouas, L., Peradotto, A., Bonnevay, S., Roux, C., Boyadjian, J., et al. (2014). Investigating the image of entities in social media: Dataset design and first results. In LREC.
  35. Vilares, D., Alonso, M. A., and Gómez-Rodríguez, C. (2014). A linguistic approach for determining the topics of spanish twitter messages. Journal of Information Science, page 0165551514561652.
  36. Yang, M.-C. and Rim, H.-C. (2014). Identifying interesting twitter contents using topical analysis. Expert Systems with Applications, 41(9):4330-4336.
Download


Paper Citation


in Harvard Style

Cossu J. and Ermakova L. (2017). Lexical Context for Profiling Reputation of Corporate Entities . In Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-248-6, pages 567-576. DOI: 10.5220/0006284505670576


in Bibtex Style

@conference{iceis17,
author={Jean-Valère Cossu and Liana Ermakova},
title={Lexical Context for Profiling Reputation of Corporate Entities},
booktitle={Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2017},
pages={567-576},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006284505670576},
isbn={978-989-758-248-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - Lexical Context for Profiling Reputation of Corporate Entities
SN - 978-989-758-248-6
AU - Cossu J.
AU - Ermakova L.
PY - 2017
SP - 567
EP - 576
DO - 10.5220/0006284505670576