CONTEXT VECTOR CLASSIFICATION - Term Classification with Context Evaluation
Hendrik Schöneberg
2010
Abstract
Automated Deep Tagging heavily relies on a term’s proper recognition. If its syntax is obfuscated by spelling mistakes, OCR errors or typing variants, regular string matching or pattern matching algorithms may not be able to succeed with the classification. Context Vector Tagging is an approach which analyzes term co-occurrence data and represents it in a vector space model, paying specific respect to the source’s language. Utilizing the cosine angle between two context vectors as similarity measure, we propose, that terms with similar context vectors share a similar word class, thus allowing even unknown terms to be classified. This approach is especially suitable to tackle the above mentioned syntactical problems and can support classic string- or pattern-based classificator-algorithms in syntactically challenging environments.
References
- Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. ACM Press, New York, 1st edition.
- Billhardt, H., (corresponding), H. B., Borrajo, D., and Maojo, V. (2002). A context vector model for information retrieval. Journal of the American Society for Information Science and Technology, 53:236-249.
- Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391-407.
- Furnas, G. W., Landauer, T. K., Gomez, L. M., and Dumais, S. T. (1987). The vocabulary problem in humansystem communication. Commun. ACM, 30(11):964- 971.
- Gauch, S., Wang, J., and Rachakonda, S. M. (1999). A corpus analysis approach for automatic query expansion and its extension to multiple databases. ACM Trans. Inf. Syst., 17(3):250-269.
- Leipzig-University (1998). German http://corpora.informatik.uni-leipzig.de/.
Paper Citation
in Harvard Style
Schöneberg H. (2010). CONTEXT VECTOR CLASSIFICATION - Term Classification with Context Evaluation . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 387-391. DOI: 10.5220/0003067403870391
in Bibtex Style
@conference{kdir10,
author={Hendrik Schöneberg},
title={CONTEXT VECTOR CLASSIFICATION - Term Classification with Context Evaluation},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={387-391},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003067403870391},
isbn={978-989-8425-28-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - CONTEXT VECTOR CLASSIFICATION - Term Classification with Context Evaluation
SN - 978-989-8425-28-7
AU - Schöneberg H.
PY - 2010
SP - 387
EP - 391
DO - 10.5220/0003067403870391