complicates the question of which texts to remove and the question of relationships to
other texts. This complicates the process of merging multiple clauses. The solution of
this problem will allow for the complete integration of the extracted text in the
Mediation Information System. The integrated version is later transformed to other
media, such as speech, through XML style sheets. This allows for the production of
extracted web text to a variety of media achieving the goal of system extensibility.
References
1. Yan T., and Garcia-Molina H., 1995. Duplication Removal in Information Dissemination,
In Proc of VLDB-95, pp66-77, September. 1995
2. Satoshi S., Chikashi N., 1997. Sentence Extraction and Information Extraction technique,
Document Understanding Conference 2003.
3. Seki Y., 2003 Sentence Extraction by tf/idf and Position Weighting from Newspaper
Articles, Document Understanding Conference 2003.
4. Klaus, Z 1997. A Literature Survey on Information Extraction and Text Summarization,
Carnegie Mellon University, April 1997
5. DUC 2003, Document Understanding Conference 2003, http://www-
nlpir.nist.gov/projects/duc/
6. Barzilay R., Elhadad N., and McKeown K., 2002. Inferring Strategies for Sentence
Ordering in Multidocument News Summarization JAIR 17, pp35-55.
7. Nenkova, A., Schiffman B., Schlaiker A., Blair-Goldensohn S., Barzilay R., Sigelman S.,
Hatzivassiloglou V., McKeown K. 2003, Columbia University at the Document
Understanding Conference 2003.
8. Goldensohn S., Evans D., Hatzivassiloglou V., McKeown K., Nenkova A., Passonneau, R.,
Schiffman B., Schlaikjar A., Siddharthan A., Siegelman S., 2004. Columbia University at
DUC 2004, Document Understanding Conference.
9. Barzilay R., McKeown K., and Elhadad N., 1999. Information Fusion in the Context of
Multi-Document Summarization. ACL 1999, pp703-733.
10. Lyons, S. and Smith, D., 2002. Domain-Specific Information Extraction Structures, DEXA
Workshops 2002: 80-84
11. Tjong E.F. and Déjean H., 2001, Introduction to the CONLL-2001 Shared Task: Clause
Identification, CoNLL-2001. http://cnts.uia.ac.be/conll2001/clauses/
12. Carreras, X. and Màrquez, L., 2001. Boosting Trees for Clause Splitting. In CoNLL'01, 5th
International Conference on Computational Natural Language Learning, Toulouse, France
13. Carreras X., Màrquez L., Evans V., and Roth D., 2002 Learning and Inference for Clause
Identification. ECML, Finland 2002
14. Mitkov R., Evans R., Orasan C., Barbu C., Jones L., Sotirova V., 2000. Coreference and
anaphor: developing annotating resources and annotation strategies, DAARC2000, 49-58
15. Ginker M., 1994. Clauses: Restrictive and Nonrestrictive
http://www.kentlaw.edu/academics/lrw/grinker/LwtaClauses__Restrictive_and_Nonrest.ht
m
117