Harshavardhan Achrekar, Avinash Gandhe, Ross Lazarus, Ssu-Hsin Yu, Benyuan Liu


Seasonal influenza epidemics causes severe illnesses and 250,000 to 500,000 deaths worldwide each year. Other pandemics like the 1918 “Spanish Flu” may change into a devastating one. Reducing the impact of these threats is of paramount importance for health authorities, and studies have shown that effective interventions can be taken to contain the epidemics, if early detection can be made. In this paper, we introduce the Social Network Enabled Flu Trends (SNEFT), a continuous data collection framework which monitors flu related tweets and track the emergence and spread of an influenza. We show that text mining significantly enhances the correlation between the Twitter and the Influenza like Illness (ILI) rates provided by Centers for Disease Control and Prevention (CDC). For accurate prediction, we implemented an auto-regression with exogenous input (ARX) model which uses current Twitter data, and CDC ILI rates from previous weeks to predict current influenza statistics. Our results show that, while previous ILI data from CDC offer a true (but delayed) assessment of a flu epidemic, Twitter data provides a real-time assessment of the current epidemic condition and can be used to compensate for the lack of current ILI data. We observe that the Twitter data is highly correlated with the ILI rates across different regions within USA and can be used to effectively improve the accuracy of our prediction. Our age-based flu prediction analysis indicates that for most of the regions, Twitter data best fit the age groups of 5-24 and 25-49 years, correlating well with the fact that these are likely, the most active user age groups on Twitter. Therefore, Twitter data can act as supplementary indicator to gauge influenza within a population and helps discovering flu trends ahead of CDC.


  1. Achrekar, H., Gandhe, A., Lazarus, R., Yu, S.-H., and Liu, B. (2011). Predicting flu trends using twitter data.
  2. IEEE Infocom, 2011 workshop on on Cyber-Physical Networking Systems (CPNS) 2011.
  3. Centers for Disease Control and Prevention (2009). FluView, a weekly influenza surveillance report.
  4. Culotta, A. (2010). Detecting influenza outbreaks by analyzing twitter messages. Knowledge Discovery and Data Mining Workshop on Social Media Analytics, 2010.
  5. Espino, J., Hogan, W., and Wagner, M. (2003). Telephone triage: A timely data source for surveillance of influenza-like diseases. In AMIA: Annual Symposium Proceedings.
  6. Ferguson, N. M., Cummings, D. A., Cauchemez, S., Fraser, C., Riley, S., Meeyai, A., Iamsirithaworn, S., and Burke, D. S. (2005). Strategies for containing an emerging influenza pandemic in southeast asia. Nature, 437:209-214.
  7. Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., and Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457:1012-1014.
  8. Jansen, B., Zhang, M., Sobel, K., and Chowdury, A. (2009). Twitter power:tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 60(1532):2169-2188.
  9. Jordans, F. (2009). WHO working on formulas to model swine flu spread.
  10. Lazarus, R., Kleinman, K., Dashevsky, I., Adams, C., Kludt, P., DeMaria, A., Jr., R., and Platt (2002). Use of automated ambulatory-care encounter records for detection of acute illness clusters, including potential bioterrorism events.
  11. Leskovec, J., Backstrom, L., and Kleinberg, J. (2009). Meme-tracking and the dynamics of the news cycle. International Conference on Knowledge Discovery and Data Mining, Paris, France, 495(978).
  12. Longini, I., Nizam, A., Xu, S., Ungchusak, K., Hanshaoworakul, W., Cummings, D., and Halloran, M. (2005). Containing pandemic influenza at the source. Science, 309(5737):1083-1087.
  13. Magruder, S. (2003). Evaluation of over-the-counter pharmaceutical sales as a possible early warning indicator of human disease. In Johns Hopkins University APL Technical Digest.
  14. Motoyama, M., Meeder, B., Levchenko, K., Voelker, G. M., and Savage, S. (2010). Measuring online service availability using twitter. Workshop on online social networks, Boston, Massachusetts, USA.
  15. Nardelli, A. (2010). Tweetminister.
  16. Sakaki, T., Okazaki, M., and Matsuo, Y. (2010). Earthquake shakes twitter users: real-time event detection by social sensors. In 19th international conference on World wide web, Raleigh, North Carolina, USA.
  17. Signorini, A., Segre, A. M., and Polgreen, P. M. (2011). The use of twitter to track levels of disease activity and public concern in the u.s. during the influenza a h1n1 pandemic. PLoS ONE, Volume 6 - Issue 5.
  18. Sitaram, A. and Huberman, B. A. (2010). Predicting the future with social media. In Social Computing Lab, HP Labs, Palo Alto, California, USA.
  19. Twitter (2011). Information on twitter users age-wise.

Paper Citation

in Harvard Style

Achrekar H., Gandhe A., Lazarus R., Yu S. and Liu B. (2012). TWITTER IMPROVES SEASONAL INFLUENZA PREDICTION . In Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2012) ISBN 978-989-8425-88-1, pages 61-70. DOI: 10.5220/0003780600610070

in Bibtex Style

author={Harshavardhan Achrekar and Avinash Gandhe and Ross Lazarus and Ssu-Hsin Yu and Benyuan Liu},
booktitle={Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2012)},

in EndNote Style

JO - Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2012)
SN - 978-989-8425-88-1
AU - Achrekar H.
AU - Gandhe A.
AU - Lazarus R.
AU - Yu S.
AU - Liu B.
PY - 2012
SP - 61
EP - 70
DO - 10.5220/0003780600610070