Detecting Topics Popular in the Recent Past from a Closed Caption TV Corpus as a Categorized Chronicle Data

Hajime Mochizuki, Kohji Shibano

Abstract

In this paper, we propose a method for extracting topics we were interested in over the course of the past 28 months from a closed-caption TV corpus. Each TV program is assigned one of the following genres: drama, informational or tabloid-style program, music, movie, culture, news, variety, welfare, or sport. We focus on informational/tabloid-style programs, dramas and news in this paper. Using our method, we extracted bigrams that formed part of the signature phrase of a heroine and the name of a hero in a popular drama, as well as recent world, domestic, showbiz, and so on news. Experimental evaluations show that our simple method is as useful as the LDA model for topic detection, and our closed-caption TV corpus has the potential value to act as a rich, categorized chronicle for our culture and social life.

References

  1. ARIB (2009). Service information for digital broadcasting system (in japanese). In Association of Radio Industries and Businesses.
  2. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. The Jornal of Machine Learning Research archive, 3.
  3. Corpus, B. N. (2007). The British National Corpus, version 3 (BNC XML Edition). Oxford University Computing Services on behalf of the BNC Consortium. http://www.natcorp.ox.ac.uk.
  4. Flowerdew, L. (2011). Corpora and Language Education. Palgrave Macmillan.
  5. Fujimoto, H., Etoh, M., Kinno, A., and Akinaga, Y. (2011). Topic analysis of web user behavior using lda model on proxy logs. Advances in Knowledge Discovery and Data Mining, LNCS, 6634/2011:525-536.
  6. Glance, N. S., Hurst, M., and Tomokiyo, T. (2004). BlogPulse: Automated Trend Discovery for Weblogs. In Proceedings of WWW2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics.
  7. Keane, N., Yee, C., and Zhou, L. (2015). Using topic modeling and similarity thresholds to detect events. In the 3rd Workshop on EVENTS at the NAACL-HLT 2015, pages 34-42.
  8. Kudo, T., Yamamoto, K., and Matsumoto, Y. (2004). Applying conditional random fields to japanese morphological analysis. In EMNLP 2004, the Conference on Empirical Methods in Natural Language Processing, pages 230-237.
  9. Lau, J. H., Collier, N., and Baldwin, T. (2012). On-line trend analysis with topic models: #twitter trends detection topic model online. In COLING 2012, pages 1519-1534.
  10. Maekawa, K., Koiso, H., Furui, S., and Isahara, H. (2000). Spontaneous Speech Corpus of Japanese. In Proceedings of the Second International Conference on Language Resources and Evaluation LREC2000, pages 947-952.
  11. Mathioudakis, M. and Koudas, N. (2010). TwitterMonitor: Trend Detection over the Twitter Stream. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pages 1155- 1158.
  12. McCallum, A. K. (2002). Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu.
  13. Mochizuki, H. and Shibano, K. (2014). Building very large corpus containing useful rich materials for language learning from closed caption tv. E-Learn 2014, World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, 2014(1):1381-1389.
  14. Nakajima, S., Zhang, J., Inagaki, Y., and Nakamoto, R. (2012). Early detection of buzzwords based on largescale time-series analysis of blog entries. In ACM Hypertext 2012, 23rd ACM Conference on Hypertext and Social Media, pages 275-284.
  15. Newman, H., Baayen, H., and Rice, S. (2011). Corpusbased Studies in Language Use, Language Learning, and Language Documentation. Rodopi Press.
  16. Wang, J., Zhao, X. W., Wei, H., Yan, H., and Li, X. (2013). Mining New Business Opportunities: Identifying Trend related Products by Leveraging Commercial Intents from Microblogs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1337-1347.
  17. Weng, J. and Lee, B. S. (2011). Event detection in twitter. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, pages 401-408.
Download


Paper Citation


in Harvard Style

Mochizuki H. and Shibano K. (2015). Detecting Topics Popular in the Recent Past from a Closed Caption TV Corpus as a Categorized Chronicle Data . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: RDBPM, (IC3K 2015) ISBN 978-989-758-158-8, pages 342-349. DOI: 10.5220/0005612103420349


in Bibtex Style

@conference{rdbpm15,
author={Hajime Mochizuki and Kohji Shibano},
title={Detecting Topics Popular in the Recent Past from a Closed Caption TV Corpus as a Categorized Chronicle Data},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: RDBPM, (IC3K 2015)},
year={2015},
pages={342-349},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005612103420349},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: RDBPM, (IC3K 2015)
TI - Detecting Topics Popular in the Recent Past from a Closed Caption TV Corpus as a Categorized Chronicle Data
SN - 978-989-758-158-8
AU - Mochizuki H.
AU - Shibano K.
PY - 2015
SP - 342
EP - 349
DO - 10.5220/0005612103420349