Author:
Jeffrey Ellen
Affiliation:
SPAWAR Systems Center Pacific, United States
Keyword(s):
Microtext, Natural language processing, Text classification, Semi-structured data, Information extraction, Sentiment analysis, Topic summarization.
Related
Ontology
Subjects/Areas/Topics:
Ambient Intelligence
;
Applications
;
Artificial Intelligence
;
Industrial Applications of AI
;
Knowledge Engineering and Ontology Development
;
Knowledge Representation and Reasoning
;
Knowledge-Based Systems
;
Natural Language Processing
;
Pattern Recognition
;
Soft Computing
;
Symbolic Systems
Abstract:
This paper defines a new term, ‘Microtext’, and takes a survey of the most recent and promising research that falls under this new definition. Microtext has three distinct attributes that differentiate it from the traditional free-text or unstructured text considered within the AI and NLP communities. Microtext is text that is generally very short in length, semi-structured, and characterized by amorphous or informal grammar and language. Examples of microtext include chatrooms (such as IM, XMPP, and IRC), SMS, voice transcriptions, and micro-blogging such as Twitter(tm). This paper expands on this definition, and provides some characterizations of typical microtext data. Microtext is becoming more prevalent. It is the thesis of this paper that the three distinct attributes of microtext yield different results and require different techniques than traditional AI and NLP techniques on long-form free text. By creating a working definition for microtext, providing a survey of the curren
t state of research in the area, it is the goal of this paper to create an understanding of microtext within the AI and NLP communities.
(More)