‘Microtext’, so that researchers can have a common
ground for future discussion.
Some of the scattered research surveyed in this
paper has provided interesting insights as to the type
of conclusions and methodologies that would be
discovered and catalogued with a more focused
effort. Some of these include: Leveraging an outside
body of knowledge, leveraging non-traditional
language features such as laughs and “uh/ums”, and
treating individual results as less important and
focusing more on less granular trends. Overall, trend
analysis and identification has the most research,
and Information Extraction from microtext is
particularly lacking.
In two different papers, SVMs were a successful
strategy in dealing with informal grammars.
The next step is investigating and more
rigorously quantifying the three attributes in the
microtext definition. This would certainly provide
reusable insights and help catalogue best performing
techniques and unique quirks and advantages of
microtext processing versus text processing. The
goal of this paper is to create an understanding of
microtext within the AI and NLP communities.
ACKNOWLEDGEMENTS
Thanks to the Office of Naval Research and the
Space and Naval Warfare Systems Center Pacific for
their financial support, and Dr. LorRaine Duffy for
inspiration and motivation. This paper is the work of
U.S. Government employees performed in the
course of employment and no copyright subsists
therein.
REFERENCES
Abrol, S. and Khan, L. 2010. TWinner: understanding
news queries with geo-content using Twitter. In
Proceedings of the 6th Workshop on Geographic
information Retrieval (Zurich, Switzerland, February
18 - 19, 2010). GIR '10. ACM, New York, NY, 1-8
Adams, P., and Martell, C., 2008. Topic Detection and
Extraction in Chat. In International Conference on
Semantic Computing, IEEE.
Bullen, R.H. Jr., and Millen, J. K., 1972. Microtext: the
design of a microprogrammed finite state search
machine for full-text retrieval. In Proceedings of the
AFIPS Joint Computer Conferences. ACM.
Cha, M., Haddadi, H., Benevenuto, F., and Gummadi, K.
P. 2010. Measuring user influence in twitter: the
million follower fallacy. In Proceedings of the 4
th
International Conference on Weblogs and Social
Media, AAAI, Washington, D.C., 2010.
Chi, E. 2009 "Information Seeking Can Be Social,"
Computer, pp. 42-46, March, 2009. IEEE
Cong, G., et al. (2008). Finding question-answer pairs
from online forums. In SIGIR '08: Proceedings of the
31st annual international ACM SIGIR conference on
Research and development in information retrieval,
pp. 467-474, New York, NY, USA. ACM.
Dalli, A., Xia, Y., and Wilks, Y., 2004. FASIL email
summarisation system. InProceedings of the 20th
international conference on Computational Linguistics
(COLING '04). ACL, Morristown, NJ, USA, Article
994.
Davidov, D., Tsur, O., Rappoport, A. 2010. Enhanced
Sentiment Learning Using Twitter Hashtags and
Smileys, In Proceedings of the 23rd international
conference on Computational Linguistics (COLING),
2010.
Flesch, R. (1948); A new readability yardstick, Journal of
Applied Psychology, Vol. 32, pp. 221–233.
Go, A., Bhayani, R., and Huang, L. 2010. Exploiting the
Unique Characteristics of Tweets for Sentiment
Analysis. CS224N Project Report, Stanford.
Gruhl, D., Nagarajan, M., Pieper, J., Robson, C., and
Sheth, A. 2009. Context and Domain Knowledge
Enhanced Entity Spotting in Informal Text.
In Proceedings of the 8th international Semantic Web
Conference. 260-276.
Kinsella, S., Passant, A., Breslin, J. 2010. Ten Years of
Hyperlinks in Online Conversations. In Proceedings of
the Web Science Conference 2010. WWW2010.
Lee, C., Kwak, H., Park, H., Moon, S., 2010. Finding
influentials based on the temporal order of information
adoption in twitter. In Proceedings of the 19th
international conference on World wide web (WWW
'10). ACM, New York, NY, USA, 1137-1138.
Laporte, Leo. 2009. [Internet Radio Broadcast] This Week
in Google 13. October 24, 2009.
Kopparapu, S. K., Srivastava, A., and Pande, A. 2007.
SMS based natural language interface to yellow pages
directory. In Proceedings of the 4th international
Conference on Mobile Technology, Applications, and
Systems and the 1st international Symposium on
Computer H uman interaction in Mobile Technology
ACM. Mobility '07. ACM, New York, NY, 558-563.
Kothari, G., Negi, S., Faruquie, T. A., Chakaravarthy, V.
T., and Subramaniam, L. V. 2009. SMS based
interface for FAQ retrieval. In Proceedings of the Joint
Conference of the 47th Annual Meeting of the ACL
and the 4th international Joint Conference on Natural
Language Processing of the Afnlp. Association for
Computational Linguistics. Morristown, NJ, 852-860.
Marom, Y. and Zukerman, I. 2009. An empirical study of
corpus-based response automation methods for an e-
mail-based help-desk domain. Computational
Linguist. 35, 4 (Dec. 2009), 597-635
Mowbray, M. 2010. The Twittering Machine. In
Proceedings of the 6th International Conference on
ALL ABOUT MICROTEXT - A Working Definition and a Survey of Current Microtext Research within Artificial
Intelligence and Natural Language Processing
335