3 COMPARATIVE TABLE
OF METHODOLOGIES
AND APPROACHES
The following table captures some of the main
features and approaches over a global comparison of
papers throughout the survey. The intent of this
comparison is to provide a collective picture of what
main capabilities exist from the papers in this
survey.
The above table shows capabilities from various
approaches. Intuitively, as more pertinent
information is captured, higher quality (minimal
redundancy and maximum information coverage)
should result. However, most of the performance
qualities are not addressed. This may be due to the
overall maturity of the technical area which is
currently striving for accuracy as measured in the
Document Understanding Conferences (DUC) that
some of the authors reference. Performance time
characteristics, other than computational complexity,
appear to be a future effort.
4 CONCLUSIONS
This survey revealed very little commonality among
the methodologies that were found. However, the
methodologies were able to be categorized into some
general headings. The papers covered in the survey
did not include enough maturity information that
could be used for comparison. A resulting
conclusion suggests that this area of natural language
processing has not matured enough to provide this
kind of product information.
Methodologies that were tested provided
precision and recall results and some included
complexity. Most were theoretical. According to a
definition found on the Oracle web site, precision
measures how well non-relevant information is
screened (not returned), and recall measures how
well the information sought is found.
A few of the most capable methodologies show
promise in providing an approximately optimized,
minimum redundancy with maximum information
coverage. However, more research needs to be
performed in natural language understanding before
maturity of these methodologies can transform into
high volume, commercial products. Normally,
providing the more capability to produce accurate
text comes with a computational (time and space)
complexity price, especially when heuristics are
involved. Some of the concept graphical approaches,
chain, meta-chains, and hierarchical approaches
provided impressive opportunities to compress and
optimize resulting text. Finding an efficient
methodology to accomplish all this would be a
significant step toward eventual technical maturity.
REFERENCES
Aiello, M., Monz, C., Todoran, L., Worring, M., 2002.
Document understanding for a broad class of
documents, Int. Journal on Document Analysis
Recognition.
Barzilay, R., Lapata, Mirella, 2008. Modeling Local
Coherence: An Entity-Based Approach, Association
for Comput Linguistics, pages 34.
Bendaould, R., Hacene, M.R., Toussaint, Y., Delecroix,
B., Napoli, A., Text-based ontology construction using
relational concept analysis, (http://simbad.u-
strasbg.fr/simbad/sim-fid)
Bourbakis, N., Manaris, R., 1998. An SPN based
Methodology for Document Understanding, IEEE
International Conference on Tools for Artificial
Intelligence, Tapei, Taiwan, pages 10-15.
Bourbakis, N., Meng, W., Zhang, C., Wu, Z., Salerno, N.
J., Borek, S., 1999. Removal of Multimedia Web
Documents and Removal of Redundant Information,
International Journal on Artificial Intelligence Tools
(IJALT), Vol. 8, No. 1, pages 19-42, World Scientific
Pubs.
Cimiano, P., Hotho, A. Staab, S., 2005. Learning Concept
Hierarchies from Text Corpora using Formal Concept
Analysis, Journal of Artificial Intelligence Research,
Vol. 24, pages 305-339.
Dahab, M. D., Hassan, H. A., Rafea, A., 2008.
TextOntoEx: Automatic ontology construction from
natural English text, Expert Systems with Applications,
Vol. 34, pages 1474-1480.
Dalianis, H., 1999. Aggregation in Natural Language
Generation, Computational Intelligence, Vol. 15, No.
4, pages 31.
Feldman, R., Regev, Y., Hurvitz, E., Finkelstein-Landau,
M., 2003. Mining the biomedical literature using
semantic analysis and natural language processing
techniques, BIOSILICO, Vol. 1, No. 2, pages 12.
Guo, Yi, Stylios, G., 2005. An intelligent summarization
system based on cognitive psychology, Information
Sciences 174, pages 1-36.
Hahn, Udo, Marko, K. G., 2002. An integrated, dual
learner for grammars and ontologies, Data &
Knowledge Engineering, Vol.42, p 273-291.
Hilberg, W., 1997. Neural networks in higher levels of
abstraction, Biological Cybernetics, 76, pp. 23-40.
Ko, Y., Seo, J., 2008. An effective sentence-extraction
technique using contextual information and statistical
approaches for text summarization, Pattern
Recognition Letters 29, p 1366-1371.
Loh, S., De Oliveria, J, Gameiro, Mauricio, 2003.
Knowledge Discovery in Texts for Constructing
ICSOFT 2010 - 5th International Conference on Software and Data Technologies
392