ALL ABOUT MICROTEXT - A Working Definition and a Survey of Current Microtext Research within Artificial Intelligence and Natural Language Processing

Jeffrey Ellen

2011

Abstract

This paper defines a new term, ‘Microtext’, and takes a survey of the most recent and promising research that falls under this new definition. Microtext has three distinct attributes that differentiate it from the traditional free-text or unstructured text considered within the AI and NLP communities. Microtext is text that is generally very short in length, semi-structured, and characterized by amorphous or informal grammar and language. Examples of microtext include chatrooms (such as IM, XMPP, and IRC), SMS, voice transcriptions, and micro-blogging such as Twitter(tm). This paper expands on this definition, and provides some characterizations of typical microtext data. Microtext is becoming more prevalent. It is the thesis of this paper that the three distinct attributes of microtext yield different results and require different techniques than traditional AI and NLP techniques on long-form free text. By creating a working definition for microtext, providing a survey of the current state of research in the area, it is the goal of this paper to create an understanding of microtext within the AI and NLP communities.

References

  1. Abrol, S. and Khan, L. 2010. TWinner: understanding news queries with geo-content using Twitter. In Proceedings of the 6th Workshop on Geographic information Retrieval (Zurich, Switzerland, February 18 - 19, 2010). GIR 7810. ACM, New York, NY, 1-8
  2. Adams, P., and Martell, C., 2008. Topic Detection and Extraction in Chat. In International Conference on Semantic Computing, IEEE.
  3. Bullen, R.H. Jr., and Millen, J. K., 1972. Microtext: the design of a microprogrammed finite state search machine for full-text retrieval. In Proceedings of the AFIPS Joint Computer Conferences. ACM.
  4. Cha, M., Haddadi, H., Benevenuto, F., and Gummadi, K. P. 2010. Measuring user influence in twitter: the million follower fallacy. In Proceedings of the 4th International Conference on Weblogs and Social Media, AAAI, Washington, D.C., 2010.
  5. Chi, E. 2009 "Information Seeking Can Be Social," Computer, pp. 42-46, March, 2009. IEEE
  6. Cong, G., et al. (2008). Finding question-answer pairs from online forums. In SIGIR 7808: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 467-474, New York, NY, USA. ACM.
  7. Dalli, A., Xia, Y., and Wilks, Y., 2004. FASIL email summarisation system. InProceedings of the 20th international conference on Computational Linguistics (COLING 7804). ACL, Morristown, NJ, USA, Article 994.
  8. Davidov, D., Tsur, O., Rappoport, A. 2010. Enhanced Sentiment Learning Using Twitter Hashtags and Smileys, In Proceedings of the 23rd international conference on Computational Linguistics (COLING), 2010.
  9. Flesch, R. (1948); A new readability yardstick, Journal of Applied Psychology, Vol. 32, pp. 221-233.
  10. Go, A., Bhayani, R., and Huang, L. 2010. Exploiting the Unique Characteristics of Tweets for Sentiment Analysis. CS224N Project Report, Stanford.
  11. Gruhl, D., Nagarajan, M., Pieper, J., Robson, C., and Sheth, A. 2009. Context and Domain Knowledge Enhanced Entity Spotting in Informal Text. In Proceedings of the 8th international Semantic Web Conference. 260-276.
  12. Kinsella, S., Passant, A., Breslin, J. 2010. Ten Years of Hyperlinks in Online Conversations. In Proceedings of the Web Science Conference 2010. WWW2010.
  13. Lee, C., Kwak, H., Park, H., Moon, S., 2010. Finding influentials based on the temporal order of information adoption in twitter. In Proceedings of the 19th international conference on World wide web (WWW 7810). ACM, New York, NY, USA, 1137-1138.
  14. Laporte, Leo. 2009. [Internet Radio Broadcast] This Week in Google 13. October 24, 2009.
  15. Kopparapu, S. K., Srivastava, A., and Pande, A. 2007. SMS based natural language interface to yellow pages directory. In Proceedings of the 4th international Conference on Mobile Technology, Applications, and Systems and the 1st international Symposium on Computer H uman interaction in Mobile Technology ACM. Mobility 7807. ACM, New York, NY, 558-563.
  16. Kothari, G., Negi, S., Faruquie, T. A., Chakaravarthy, V. T., and Subramaniam, L. V. 2009. SMS based interface for FAQ retrieval. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th international Joint Conference on Natural Language Processing of the Afnlp. Association for Computational Linguistics. Morristown, NJ, 852-860.
  17. Marom, Y. and Zukerman, I. 2009. An empirical study of corpus-based response automation methods for an email-based help-desk domain. Computational Linguist. 35, 4 (Dec. 2009), 597-635
  18. Mowbray, M. 2010. The Twittering Machine. In Proceedings of the 6th International Conference on Web Information Systems and Technologies (WEBIST 2010). INSTICC. 299-304.
  19. O'Connor, B., Krieger, M., and Ahn, D. 2010. TweetMotif: Exploratory Search and Topic Summarization for Twitter. In Proceedings of the International AAAI Conference on Weblogs and Social Media. Washington, DC, May 2010
  20. Phan, X.-H., et al. (2008). Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In WWW 7808: Proceeding of the 17th international conference on World Wide Web, pp. 91- 100, New York, NY, USA. ACM.
  21. Ranganath, R., Jurafsky, D., and McFarland, D. 2009. It's not you, it's me: detecting flirting and its misperception in speed-dates. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1. ACL.
  22. Read, J. 2005. Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In Proceedings of the ACL Student Research Workshop ACL.
  23. Ritter, A., Cherry, C. And Dolan, B. 2010 Unsupervised Modeling of Twitter Conversations. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. ACL, Los Angeles, CA, 172-180.
  24. Ritterman, J., Osborne, M., and Klein, E. 2009. Using prediction markets and Twitter to predict a swine flu pandemic. In 1st International Workshop on Mining Social Media - 13th Conference of the Spanish Association for Artificial Intelligence, 2009. AEPIA (Asociación Española de Inteligencia Artificial).
  25. Rosa, K. D. and Ellen, J. 2009. Text Classification Methodologies Applied to Micro-Text in Military Chat. In Proceedings of the 2009 international Conference on Machine Learning and Applications (December 13 - 15, 2009). ICMLA. IEEE Computer Society, Washington, DC, 710-71.
  26. Sharifi, B., et al. (2010). Summarizing Microblogs Automatically. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 685-688, Los Angeles, CA. ACL.
  27. Tumasjan, A., et al. 2010. Predicting Elections with Twitter: Predicting elections with Twitter: What 140 characters reveal about political sentiment. In International AAAI Conference on Weblogs and Social Media, AAAI, Washington, D.C., 2010.
  28. Wang, A. H. 2010. Don't follow me - Spam Detection in Twitter. In Proceedings of the International Conference on Security and Cryptography (SECRYPT 2010). INSTICC. 142-151.
  29. Wilson, T., Wiebe, J., and Hoffmann, P. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. ACL, Morristown, NJ, 347-354.
Download


Paper Citation


in Harvard Style

Ellen J. (2011). ALL ABOUT MICROTEXT - A Working Definition and a Survey of Current Microtext Research within Artificial Intelligence and Natural Language Processing . In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-40-9, pages 329-336. DOI: 10.5220/0003179903290336


in Bibtex Style

@conference{icaart11,
author={Jeffrey Ellen},
title={ALL ABOUT MICROTEXT - A Working Definition and a Survey of Current Microtext Research within Artificial Intelligence and Natural Language Processing },
booktitle={Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2011},
pages={329-336},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003179903290336},
isbn={978-989-8425-40-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - ALL ABOUT MICROTEXT - A Working Definition and a Survey of Current Microtext Research within Artificial Intelligence and Natural Language Processing
SN - 978-989-8425-40-9
AU - Ellen J.
PY - 2011
SP - 329
EP - 336
DO - 10.5220/0003179903290336