Author:
Hendrik Schöneberg
Affiliation:
University of Würzburg, Germany
Keyword(s):
Text mining, Classification, Deep tagging, Information retrieval.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining Text and Semi-Structured Data
;
Symbolic Systems
Abstract:
Automated Deep Tagging heavily relies on a term’s proper recognition. If its syntax is obfuscated by spelling mistakes, OCR errors or typing variants, regular string matching or pattern matching algorithms may not be able to succeed with the classification. Context Vector Tagging is an approach which analyzes term co-occurrence data and represents it in a vector space model, paying specific respect to the source’s language. Utilizing the cosine angle between two context vectors as similarity measure, we propose, that terms with similar context vectors share a similar word class, thus allowing even unknown terms to be classified. This approach is especially suitable to tackle the above mentioned syntactical problems and can support classic string- or pattern-based classificator-algorithms in syntactically challenging environments.