Topic and Subject Detection in News Streams for Multi-document Summarization

Fumiyo Fukumoto, Yoshimi Suzuki, Atsuhiro Takasu

2012

Abstract

This paper focuses on continuous news streams and presents a method for detecting salient, key sentences from stories that discuss the same topic. Our hypothesis about key sentences in multiple stories is that they include words related to the target topic, and the sub ject of a story. In addition to the TF-IDF term weighting method, we used the result of assigning domain-specific senses to each word in the story to identify a subject. A topic, on the other hand, is identified by using a model of ”topic dynamics”. We defined a burst as a time interval of maximal length over which the rate of change is positive acceleration. We adapted stock market trend analysis technique, i.e., Moving Average Convergence Divergence (MACD). It shows the relationship between two moving averages of prices, and is popular indicator of trends in dynamic marketplaces. We utilized it to measure topic dynamics. The method was tested on the TDT corpora, and the results showed the effectiveness of the method.

References

  1. Allan, J., editor (2003). Topic Detection and Tracking. Kluwer Academic Publishers.
  2. Celikylmaz, A. and Hakkani-Tur, D. (2011). Discovery of Topically Coherent Sentences for Extractive Summarization. In Proc. of the 49th ACL, pages 491-499.
  3. He, D. and Parker, D. S. (2010). Topic Dynamics: An Alternative Model of Bursts in Streams of Topics. In Proc. of the 16th ACM SIGKDD, pages 443-452.
  4. Lin, C.-Y. and Hovy, E. H. (2002). From Single to MultiDocument Summarization: A Prototype System and its Evaluation. In Proc. of the 40th ACL, pages 457- 464.
  5. Magnini, B. and Cavaglia, G. (2000). Integrating Subject Field Codes into WordNet. In In Proc. of the 2nd LREC.
  6. Marcu, D. and Echihabi, A. (2002). An Unsupervised Approach to Recognizing Discourse Relations. In In Proc. of the 40th ACL, pages 368-375.
  7. Mimno, D., Li, W., and McCallum, A. (2007). Mixtures of Hierarchical Topics with Pachinko Allocation. In In Proc. of the 24th ICML, pages 633-640.
  8. Schmid, H. (1995). Improvements in Part-of-Speech Tagging with an Application to German. In Proc. of the EACL.
  9. Wan, X. and Yang, J. (2008). Multi-Document Summarization using Cluster-based Link Analysis. In Proc. of the 31st ACM SIGIR, pages 299-306.
Download


Paper Citation


in Harvard Style

Fukumoto F., Suzuki Y. and Takasu A. (2012). Topic and Subject Detection in News Streams for Multi-document Summarization . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2012) ISBN 978-989-8565-30-3, pages 166-171. DOI: 10.5220/0004109901660171


in Bibtex Style

@conference{keod12,
author={Fumiyo Fukumoto and Yoshimi Suzuki and Atsuhiro Takasu},
title={Topic and Subject Detection in News Streams for Multi-document Summarization},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2012)},
year={2012},
pages={166-171},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004109901660171},
isbn={978-989-8565-30-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2012)
TI - Topic and Subject Detection in News Streams for Multi-document Summarization
SN - 978-989-8565-30-3
AU - Fukumoto F.
AU - Suzuki Y.
AU - Takasu A.
PY - 2012
SP - 166
EP - 171
DO - 10.5220/0004109901660171