Posgram Driven Word Prediction

Carmelo Spiccia, Agnese Augello, Giovanni Pilato

Abstract

Several word prediction algorithms have been described in literature for automatic sentence completion from a finite candidate words set. However, at the best of our knowledge, very little or no work has been done on reducing the cardinality of this set. To address this issue, we use posgrams to predict the part of speech of the missing word first. Candidate words are then restricted to the ones fulfilling the predicted part of speech. We show how this additional step can improve the processing speed and the accuracy of word predictors. Experimental results are provided for the Italian language.

References

  1. Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E., 2009. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. In Language resources and evaluation, Vol. 43, no. 3, 209-226.
  2. Bellegarda, J.R., 1998. A multispan language modeling framework for large vocabulary speech recognition. In Speech and Audio Processing, IEEE Transactions on, Vol. 6, no. 5, 456-467.
  3. Calzolari, N., McNaught, J., Zampolli, A., 1996. EAGLES Final Report: EAGLES Editors' Introduction. EAGEB-EI. Pisa, Italy.
  4. Carlberger, A., Carlberger, J., Magnuson, T., Hunnicutt, S., Palazuelos-Cagigas, S.E., Navarro, S.A., 1997. Profet, A New Generation of Word Prediction: An Evaluation Study. In Proceedings, ACL Workshop on Natural language processing for communication aids, 23-28.
  5. Ferraresi, A., Zanchetta, E., Baroni M., Bernardini S., 2010, Semantically and Syntactically Annotated Italian Wikipedia. WaCky Corpora. University of Bologna. http://wacky.sslmit.unibo.it/doku.php?id= corpora (Accessed on: 1st of July 2015).
  6. Gubbins, J., Vlachos, A., 2013. Dependency Language Models for Sentence Completion. In EMNLP. 1405- 1410.
  7. Lindquist, H., 2009. Corpus Linguistics and the Description of English. Edinburg University Press, 102-103.
  8. Lyding, V., Stemle, E., Borghetti, C., Brunello, M., Castagnoli, S., Dell'Orletta, F., Dittmann, H., Lenci, A., Pirrelli, V., 2014. The PAISA Corpus of Italian Web Texts. In Proceedings of the 9th Web as Corpus Workshop (WaC-9), 14th Conference of the European Chapter of the Association for Computational Linguistics, 36-43.
  9. Medialab, 2009, Tanl POS Tagset, University of Pisa. http://medialab.di.unipi.it/wiki/Tanl_POS_Tagset (Accessed on: 1st of July 2015).
  10. Mikolov, T., Chen, K., Corrado, G., Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint, arXiv:1301.3781.
  11. Mnih, A., Teh, Y.W., 2012. A fast and simple algorithm for training neural probabilistic language models. arXiv preprint, arXiv:1206.6426.
  12. Spiccia, C., Augello, A., Pilato, G., Vassallo, G.: A word prediction methodology for automatic sentence completion. In Semantic Computing (ICSC), 2015 IEEE International Conference on, 240-243.
  13. Stubbs, M., 2007. An example of frequent English phraseology: distributions, structures and functions. In Language and Computers, Vol. 62, no. 1, 89-105.
  14. Zweig, G., Burges, C.J.C., 2011. The Microsoft Research Sentence Completion Challenge. Microsoft Research Technical Report. MSR-TR-2011-129.
Download


Paper Citation


in Harvard Style

Spiccia C., Augello A. and Pilato G. (2015). Posgram Driven Word Prediction . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: DART, (IC3K 2015) ISBN 978-989-758-158-8, pages 589-596. DOI: 10.5220/0005613305890596


in Bibtex Style

@conference{dart15,
author={Carmelo Spiccia and Agnese Augello and Giovanni Pilato},
title={Posgram Driven Word Prediction},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: DART, (IC3K 2015)},
year={2015},
pages={589-596},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005613305890596},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: DART, (IC3K 2015)
TI - Posgram Driven Word Prediction
SN - 978-989-758-158-8
AU - Spiccia C.
AU - Augello A.
AU - Pilato G.
PY - 2015
SP - 589
EP - 596
DO - 10.5220/0005613305890596