LATENT SEMANTIC INDEXING USING MULTIRESOLUTION ANALYSIS

Tareq Jaber, Abbes Amira, Peter Milligan

Abstract

Latent semantic indexing (LSI) is commonly used to match queries to documents in information retrieval (IR) applications. It has been shown to improve the retrieval performance, as it can deal with synonymy and polysemy problems. This paper proposes a hybrid approach which can improve result accuracy significantly. Evaluation of the approach based on using the Haar wavelet transform (HWT) as a preprocessing step for the singular value decomposition (SVD) in the LSI system is presented, using Donoho′s thresholding with the transformation in HWT. Furthermore, the effect of different levels of decomposition in the HWT process is investigated. The experimental results presented in the paper confirm a significant improvement in performance by applying the HWT as a preprocessing step using Donoho′s thresholding.

References

  1. Amira, A. and Farrell, P. (2005). An automatic face recognition system based on wavelet transforms. In IEEE International Conference on Circuits and Systems, page 62526255.
  2. Bell, M. and Degani, N. (2002). Latent semantic indexing, parallel svd and its applications. In ALGORITMY 2002, page 113120.
  3. Berry, M., Dumais, S., and O'Brien, G. (1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37:573 595.
  4. Berry, M. W., Drmavc, Z., and Jessup, E. R. (1999). Matrices, vector spaces, and information retrieval. SIAM Review, 41:335362.
  5. Cochrane (2005). The http://www.cochrane.org.
  6. Delakis, I., Hammad, O., and Kitney, R. I. (2007). Waveletbased denoising algorithm for images acquired with parallel magnetic resonance imaging (mri). Physics in Medicine and Biology, 52:37413751.
  7. Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transaction on Information Theory, 41:613627.
  8. eBooks (2005). Queen's university hosted library catalogues. http://library.qub.ac.uk.
  9. Fox, C. (1992). Lexical analysis and stoplists. in information retrieval - data structures & algorithm. PrenticeHall.
  10. Hoenkamp, E. (2003). Unitary operators on the document space source. Journal of the American Society for Information Science and Technology, 54:14320.
  11. Jaber, T., Amira, A., and Milligan, P. (2006). A novel approach for lexical noise analysis and measurement in intelligent information retrieval. In IEEE International Conference on Pattern Recognition ICPR, volume 3, page 370373, Hong Kong.
  12. Jaber, T., Amira, A., and Milligan, P. (2008). Performance evaluation of dct and wavelet transform for lsi. In IEEE International Symposium on Circuits and Systems (ISCAS).
  13. Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 24:3543.
  14. Yoon, B. and Vaidyanathan, P. P. (2004). Wavelet-based denoising by customized thresholding. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, page 925928.
Download


Paper Citation


in Harvard Style

Jaber T., Amira A. and Milligan P. (2011). LATENT SEMANTIC INDEXING USING MULTIRESOLUTION ANALYSIS . In Proceedings of the 1st International Conference on Pervasive and Embedded Computing and Communication Systems - Volume 1: PECCS, ISBN 978-989-8425-48-5, pages 327-332. DOI: 10.5220/0003313203270332


in Bibtex Style

@conference{peccs11,
author={Tareq Jaber and Abbes Amira and Peter Milligan},
title={LATENT SEMANTIC INDEXING USING MULTIRESOLUTION ANALYSIS},
booktitle={Proceedings of the 1st International Conference on Pervasive and Embedded Computing and Communication Systems - Volume 1: PECCS,},
year={2011},
pages={327-332},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003313203270332},
isbn={978-989-8425-48-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pervasive and Embedded Computing and Communication Systems - Volume 1: PECCS,
TI - LATENT SEMANTIC INDEXING USING MULTIRESOLUTION ANALYSIS
SN - 978-989-8425-48-5
AU - Jaber T.
AU - Amira A.
AU - Milligan P.
PY - 2011
SP - 327
EP - 332
DO - 10.5220/0003313203270332