Authors:
Hajime Mochizuki
and
Kohji Shibano
Affiliation:
Tokyo University of Foreign Studies, Japan
Keyword(s):
Topic detection, Closed Caption TV Corpus.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Communication, Collaboration and Information Sharing
;
Intelligent Information Systems
;
Knowledge Management and Information Sharing
;
Knowledge-Based Systems
;
Metadata and Structured Documents
;
Organizational Memories
;
Symbolic Systems
;
Tools and Technology for Knowledge Management
Abstract:
In this paper, we propose a method for extracting topics we were interested in over the course of the past 28 months from a closed-caption TV corpus. Each TV program is assigned one of the following genres: drama, informational or tabloid-style program, music, movie, culture, news, variety, welfare, or sport. We focus on informational/tabloid-style programs, dramas and news in this paper. Using our method, we extracted bigrams that formed part of the signature phrase of a heroine and the name of a hero in a popular drama, as well as recent world, domestic, showbiz, and so on news. Experimental evaluations show that our simple method is as useful as the LDA model for topic detection, and our closed-caption TV corpus has the potential value to act as a rich, categorized chronicle for our culture and social life.