Latent Ambiguity in Latent Semantic Analysis?

Martin Emms, Alfredo Maldonado-Guerra

2013

Abstract

Latent Semantic Analyis (LSA) consists in the use of SVD-based dimensionality-reduction to reduce the high dimensionality of vector representations of documents, where the dimensions of the vectors correspond simply to word counts in the documents. We show that that there are two contending, inequivalent, formulations of LSA. The distinction between the two is not generally noted and while some work adheres to one formulation, other work adheres to the other formulation. We show that on both a tiny contrived data-set and also on a more substantial word-sense discovery data-set that the empirical outcomes achieved with LSA vary according to which formulation is chosen.

References

  1. Bartell, B. T., Cottrell, G. W., and Belew, R. K. (1992). Latent semantic indexing is an optimal special case of multidimensional scaling. In Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 161-167. ACM Press.
  2. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Scociety for Information Science, 41(6):391-407.
  3. Gong, Y. and Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis. In SIGIR, pages 19-25.
  4. Kontostathis, A. and Pottenger, W. M. (2006). A framework for understanding latent semantic indexing (lsi) performance. Information Processing and Management, 42(1):56-73.
  5. Landauer, T., Foltz, P., and Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(1):259-284.
  6. Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
  7. Papadimitriou, C. H., Raghavan, P., Tamaki, H., and Vempala, S. (2000). Latent semantic indexing: A probabilistic analysis. J. Comput. Syst. Sci., 61(2):217-235.
  8. Rosario, B. (2000). Latent semantic indexing: An overview. Technical report, Berkeley University. available at http://people.ischool.berkeley.edu/~rosario/projects/L SI.pdf.
  9. Zelikovitz, S. and Hirsh, H. (2001). Using lsi for text classification in the presence of background text. In Proceedings of CIKM-01, 10TH ACM International Conference on information and knowledge management, pages 113-118. ACM Press.
Download


Paper Citation


in Harvard Style

Emms M. and Maldonado-Guerra A. (2013). Latent Ambiguity in Latent Semantic Analysis? . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 115-120. DOI: 10.5220/0004178301150120


in Bibtex Style

@conference{icpram13,
author={Martin Emms and Alfredo Maldonado-Guerra},
title={Latent Ambiguity in Latent Semantic Analysis?},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2013},
pages={115-120},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004178301150120},
isbn={978-989-8565-41-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Latent Ambiguity in Latent Semantic Analysis?
SN - 978-989-8565-41-9
AU - Emms M.
AU - Maldonado-Guerra A.
PY - 2013
SP - 115
EP - 120
DO - 10.5220/0004178301150120