Text Simplification for Enhanced Readability

Siddhartha Banerjee, Nitin Kumar, C. E. Veni Madhavan

Abstract

Our goal is to perform automatic simplification of a piece of text to enhance readability. We combine the two processes of summarization and simplification on the input text to effect improvement. We mimic the human acts of incremental learning and progressive refinement. The steps are based on: (i) phrasal units in the parse tree, yield clues (handles) on paraphrasing at a local word/phrase level for simplification, (ii) phrasal units also provide the means for extracting segments of a prototype summary, (iii) dependency lists provide the coherence structures for refining a prototypical summary. A validation and evaluation of a paraphrased text can be carried out by two methodologies: (a) standardized systems of readability, precision and recall measures, (b) human assessments. Our position is that a combined paraphrasing as above, both at lexical (word or phrase) level and a linguistic-semantic (parse tree, dependencies) level, would lead to better readability scores than either approach performed separately.

References

  1. Aluisio, S., Specia, L., Gasperin, C., and Scarton, C. (2010). Readability assessment for text simplification. In Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, pages 1-9. Association for Computational Linguistics.
  2. Biran, O., Brody, S., and Elhadad, N. (2011). Putting it simply: a context-aware approach to lexical simplification. In the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 496-501.
  3. Cortes, C. and Vapnik, V. Support-vector networks. Machine Learning, 20(1).
  4. Devlin, S. and Tait, J. (1993). The use of a psycho-linguistic database in the simplification of text for aphasic readers. pages 161-173.
  5. Francis, W. and Kucera, H. (1979). Brown corpus manual. http://khnt.hit.uib.no/icame/manuals/brown/ INDEX.HTM.
  6. Kamp, H. and Reyle, U. (1990). From Discourse to Logic/ An Introduction to the Modeltheoritic Semantics of Natural Language. Kluwer, Dordrecht.
  7. Klein, D. and Manning, C. D. (2003). Accurate unlexicalized parsing. In 41st annual meeting of the Association of Computational Linguistics. ACL.
  8. Lin, C.-Y. and Hovy, E. H. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Language Technology Conference (HLT-NAACL). ACL.
  9. Liu, H. and Singh, P. (2004). Conceptnet: A practical commonsense reasoning toolkit.
  10. Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. J. (1990). Introduction to wordnet: An online lexical database. International journal of lexicography, 3(4):235-244.
  11. Napoles, C. and Dredze, M. (2010). Learning simple wikipedia: A cogitation in ascertaining abecedarian language. In Proceedings of HLT/NAACL Workshop on Computation Linguistics and Writing.
  12. Vadlapudi, R. and Katragadda, R. (2010). Quality evaluation of grammaticality of summaries. In 11th Intl. conference on Computational Linguistics and Intelligent Text.
  13. Ward, G. (2011). Moby project. http:// www.gutenberg.org/ dirs/etext02.
  14. Weidi, R. (1998). The cmu pronunciation dictionary. release 0.6.
  15. WikiRanks (2006). Wiktionary: Frequency list. http:// en.wiktionary.org/wiki/Wiktionary:Frequency lists.
  16. Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C., and Lee, L. (2010). For the sake of simplicity: Unsupervised extraction of lexical simplifications from wikipedia. arXiv preprint arXiv:1008.1986.
  17. Zhao, S., Liu, T., Yuan, X., Li, S., and Zhang, Y. (2007). Automatic acquisition of context-specific lexical paraphrases. In Proceedings of IJCAI, volume 1794.
Download


Paper Citation


in Harvard Style

Banerjee S., Kumar N. and E. Veni Madhavan C. (2013). Text Simplification for Enhanced Readability . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013) ISBN 978-989-8565-75-4, pages 202-207. DOI: 10.5220/0004626102020207


in Bibtex Style

@conference{kdir13,
author={Siddhartha Banerjee and Nitin Kumar and C. E. Veni Madhavan},
title={Text Simplification for Enhanced Readability},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)},
year={2013},
pages={202-207},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004626102020207},
isbn={978-989-8565-75-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)
TI - Text Simplification for Enhanced Readability
SN - 978-989-8565-75-4
AU - Banerjee S.
AU - Kumar N.
AU - E. Veni Madhavan C.
PY - 2013
SP - 202
EP - 207
DO - 10.5220/0004626102020207