NLU METHODOLOGIES FOR CAPTURING NON-REDUNDANT INFORMATION FROM MULTI-DOCUMENTS - A Survey

Michael T. Mills, Nikolaos G. Bourbakis

Abstract

This paper provides a comparative survey of natural language understanding (NLU) methodologies for capturing non-redundant information from multiple documents. The scope of these methodologies is to generate a text output with reduced information redundancy and increased information coverage. The purpose of this paper is to inform the reader what methodologies exist and their features based on evaluation criteria selected by users. Tables of comparison at the end of this survey provide a quick glance of these technical attributes indicators abstracted from available information in the publications.

References

  1. Aiello, M., Monz, C., Todoran, L., Worring, M., 2002. Document understanding for a broad class of documents, Int. Journal on Document Analysis Recognition.
  2. Barzilay, R., Lapata, Mirella, 2008. Modeling Local Coherence: An Entity-Based Approach, Association for Comput Linguistics, pages 34.
  3. Bourbakis, N., Manaris, R., 1998. An SPN based Methodology for Document Understanding, IEEE International Conference on Tools for Artificial Intelligence, Tapei, Taiwan, pages 10-15.
  4. Bourbakis, N., Meng, W., Zhang, C., Wu, Z., Salerno, N. J., Borek, S., 1999. Removal of Multimedia Web Documents and Removal of Redundant Information, International Journal on Artificial Intelligence Tools (IJALT), Vol. 8, No. 1, pages 19-42, World Scientific Pubs.
  5. Cimiano, P., Hotho, A. Staab, S., 2005. Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis, Journal of Artificial Intelligence Research, Vol. 24, pages 305-339.
  6. Dahab, M. D., Hassan, H. A., Rafea, A., 2008. TextOntoEx: Automatic ontology construction from natural English text, Expert Systems with Applications, Vol. 34, pages 1474-1480.
  7. Dalianis, H., 1999. Aggregation in Natural Language Generation, Computational Intelligence, Vol. 15, No. 4, pages 31.
  8. Feldman, R., Regev, Y., Hurvitz, E., Finkelstein-Landau, M., 2003. Mining the biomedical literature using semantic analysis and natural language processing techniques, BIOSILICO, Vol. 1, No. 2, pages 12.
  9. Guo, Yi, Stylios, G., 2005. An intelligent summarization system based on cognitive psychology, Information Sciences 174, pages 1-36.
  10. Hahn, Udo, Marko, K. G., 2002. An integrated, dual learner for grammars and ontologies, Data & Knowledge Engineering, Vol.42, p 273-291.
  11. Hilberg, W., 1997. Neural networks in higher levels of abstraction, Biological Cybernetics, 76, pp. 23-40.
  12. Ko, Y., Seo, J., 2008. An effective sentence-extraction technique using contextual information and statistical approaches for text summarization, Pattern Recognition Letters 29, p 1366-1371.
  13. Loh, S., De Oliveria, J, Gameiro, Mauricio, 2003. Knowledge Discovery in Texts for Constructing Decision Support Systems, Applied Intelligence, 18, pp. 357-366.
  14. Manabu, O., Hajime, M., 2000. Query-Based Summarization Based On Lexical Chaining, Computational Intelligence, Vol. 16, 4,), pp. 8.
  15. Marco, A., Monz, C., Todoran, L., Worring, M., 2002. Document understanding for a broad class of documents, International journal on Document Analysis and Recognition, Vol. 5, pages 1-16.
  16. Meadche, A., Staab, S., 2004. Ontology Learning, Handbook on Ontologies, Pages 18.
  17. Moens, M.F, Angheluta, R., Dumortier J., 2005. Generic technologies for single- and multi-documents summarization, Information Processing and Management, Vol. 41, pages 569-586.
  18. Neustein, A., 2001. Using Sequence Package Analysis to Improve Natural Language Understanding, Int. Journal of Speech Technology, Vol. 4, pages 31-44.
  19. Nomoto, T, Matsumoto, Y, 2003. The diversity-based approach to open-domain text summarization, Information Processing and Management, 39 pages 363-389.
  20. Pado, S., Lapata, M., 2007. Dependeny-Based Constuction of Semantic Space Models, Association for Computational Linguistics, pages 40.
  21. Radev, D. R., Jing, H., Stys, M. Tam, D., 2004. Centroidbased summarization of multiple documents, Information Processing and Management, 40, pages 919-938.
  22. Rajaraman, K, Tan, 2002. A-H, Knowledge Discovery from Texts: A Concept Frame Graph Approach, CIKM 2002, pages 3.
  23. Reeve, L., Han, H., Brool, A.D., 2006. BioChain: Lexical Chaining Methods for Biomedical Text Summarization, SAC 2006, ACM, pages 5.
  24. Shunsfard, M.,et.al., 2004. Learning ontologies from natural language texts, International J. HumanComputer Studies, 60, pages 17-63.
  25. Silber, H. G., McCoy, K., 2002. Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization, Association for Computational Linguistics, pages 10.
  26. Stein, C.S., Strzalkowski, T., Wise, G.B., 2000. Interactive, Text-Based Summarization of Multiple Documents, Computational Intelligence, Vol. 16, Nov. 4, pp.8.
  27. Valakos, A.G., Karkaletsis, V., Alexopoulou, D. Papadimitriou, E., Spyropoulos, C.D., Vouros, G., 2006. Building an Allergens Ontology and Maintaining it using Machine Learning Techniques, Computers in Biology and Medicine Journal, pages 32.
  28. Yang, C. C., Wang, F. L., 2008. Hierarchical Summarization of Large Documents, Journal of the American Society for Information Science and Technology, Vol. 59, Num. 6, pages 887-902.
  29. Ye, Shiren, Chua, T-S, Kan, M-Y., Qiu, L., 2007. Document concept lattice for text understanding and summarization, Information Processing and Management, Vol. 43, pages 1643-1662.
  30. Yeh, J-Y., Ke, H-R., Y, W-P, Meng, I-H., 2005. Text summarization using a trainable summarizer and latent semantic analysis, Information Processing and Management, Vol. 41, pages 75-95.
  31. Zhou, G., Su, J., 2005. Machine learning-based named entity recognition via effective integration of various evidences, Natural Language Engineering, Vol. 11, No. 2, pages 189-206.
Download


Paper Citation


in Harvard Style

T. Mills M. and G. Bourbakis N. (2010). NLU METHODOLOGIES FOR CAPTURING NON-REDUNDANT INFORMATION FROM MULTI-DOCUMENTS - A Survey . In Proceedings of the 5th International Conference on Software and Data Technologies - Volume 2: ICSOFT, ISBN 978-989-8425-23-2, pages 384-393. DOI: 10.5220/0002998603840393


in Bibtex Style

@conference{icsoft10,
author={Michael T. Mills and Nikolaos G. Bourbakis},
title={NLU METHODOLOGIES FOR CAPTURING NON-REDUNDANT INFORMATION FROM MULTI-DOCUMENTS - A Survey},
booktitle={Proceedings of the 5th International Conference on Software and Data Technologies - Volume 2: ICSOFT,},
year={2010},
pages={384-393},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002998603840393},
isbn={978-989-8425-23-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Software and Data Technologies - Volume 2: ICSOFT,
TI - NLU METHODOLOGIES FOR CAPTURING NON-REDUNDANT INFORMATION FROM MULTI-DOCUMENTS - A Survey
SN - 978-989-8425-23-2
AU - T. Mills M.
AU - G. Bourbakis N.
PY - 2010
SP - 384
EP - 393
DO - 10.5220/0002998603840393