to improving the classification accuracy. Although
good and promising results were achieved using the
proposed data type (online forum discussion) and
features (LIWC) many directions remain open for
development in this area of research. One of the
important aspects is the utilisation of the LIWC
dictionary for further analysis of a person’s emotional
and psychological status at various stages of the stop
smoking process (i.e. in journey to stop smoking).
REFERENCES
Linguistic inquiry and word count. October 2013 Available
from http://www.liwc.net/.
The Stanford natural language processing group.
October 2013 Available from
http://nlp.stanford.edu/software/tagger.shtml.
WEKA, the university of Wekato. October 2013 Available
from http://www.cs.waikato.ac.nz/ml/weka/.
Informatics for integrating biology and the bedside. October
2013Available from https://www.i2b2.org/.
Aramaki, Eiji, Takeshi Imai, Kengo Miyo, and Kazuhiko Ohe.
2006. Patient status classification by using rule based
sentence extraction and BM25 kNN-based classifier. Paper
presented at i2b2 Workshop on Challenges in Natural
Language Processing for Clinical Data, [no pagination] .
Clark, C., K. Good, L. Jezierny, M. Macpherson, B. Wilson,
and U. Chajewska. 2008. Identifying smokers with a
medical extraction system. Journal of the American
Medical Informatics Association 15 (1): 36-39.
Cohen, Aaron M. 2008. Five-way smoking status classification
using text hot-spot identification and error-correcting output
codes. Journal of American Medical Informatics
Association 15 (1): 32-35.
Gill, Alastair J., Scott Nowson, and Jon
Oberlander.2006.Language and personality in computer-
mediated communication: A cross-genre comparison.
Journal of Computer Mediated Communication, [no
pagination].
Kaiser, C., and F. Bodendorf. 2012. Mining patient experiences
on web 2.0-A case study in the pharmaceutical industry.
Paper presented at SRII Global Conference (SRII), 2012
Annual, 139-145 .
Leshed, Gilly, and Joseph'Jofish' Kaye. 2006. Understanding
how bloggers feel: Recognizing affect in blog posts. Paper
presented at CHI'06 extended abstracts on Human factors
in computing systems, 1019-1024.
Liu, Mei, Anushi Shah, Min Jiang, Neeraja B. Peterson, Qi Dai,
Melinda C. Aldrich, Qingxia Chen, Erica A. Bowton,
Hongfang Liu, and Joshua C. Denny. 2012. A study of
transportability of an existing smoking status detection
module across institutions. Paper presented at AMIA
Annual Symposium Proceedings, 577-586.
Pang, Bo, Lillian Lee, and ShivakumarVaithyanathan. 2002.
Thumbs up?: Sentiment classification using machine
learning techniques. Paper presented at Proceedings of the
ACL-02 conference on Empirical methods in natural
language processing-Volume 10, 79-86.
Pedersen, Ted. 2006. Determining smoker status using
supervised and unsupervised learning with lexical features.
Paper presented at i2b2Workshop on Challenges in
Natural Language Processing for Clinical Data, [no
pagination].
Savova, Guergana K., Philip V. Ogren, Patrick H. Duffy, James
D. Buntrock, and Christopher G. Chute. 2008. Mayo clinic
NLP system for patient smoking status identification.
Journal of American Medical Informatics Association 15
(1): 25-28.
Sordo, Margarita, and Qing Zeng. 2005. On sample size and
classification accuracy: A performance comparison. In
Biological and medical data analysis., 193-201Springer.
Szarvas, György, RichárdFarkas, SzilárdIván, AndrásKocsor,
and RóbertBusaFekete. 2006. Automatic extraction of
semantic content from medical discharge records. Paper
presented at i2b2 Workshop on Challenges in Natural
Language Processing for Clinical Data, [no pagination].
Tausczik, Y. R., and J. W. Pennebaker. 2010. The
psychological meaning of words: LIWC and computerized
text analysis methods. Journal of Language and Social
Psychology 29 (1): 24-54.
Uzuner, Özlem, Ira Goldstein, Yuan Luo, and Isaac Kohane.
2008. Identifying patient smoking status from medical
discharge records. Journal American Medical Informatics
Association 15 (1): 14-24.
Wicentowski, Richard, and Matthew R. Sydes. 2008. Using
implicit information to identify smoking status in smoke-
blind medical discharge summaries. Journal of the
American Medical Informatics Association 15 (1): 29-31.
Wu, X., V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H.
Motoda, G. J. McLachlan, A. Ng, B. Liu, and P. S. Yu.
2008. Top 10 algorithms in data mining. Knowledge and
Information Systems 14 (1): 1-37.
Zeng, Q. T., S. Goryachev, S. Weiss, M. Sordo, S. N. Murphy,
and R. Lazarus. 2006. Extracting principal diagnosis, co-
morbidity and smoking status for asthma research:
Evaluation of a natural language processing system. BMC
Medical Informatics and Decision Making 6 (1): 30-38.
TextAnalysisofUser-GeneratedContentsforHealth-careApplications-CaseStudyonSmokingStatusClassification
249