Authors:
Katrin Braunschweig
;
Maik Thiele
;
Elvis Koci
and
Wolfgang Lehner
Affiliation:
Technische Universität Dresden, Germany
Keyword(s):
Information Extraction, Web Tables, Text Tiling, Similarity Measures.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Concept Mining
;
Context Discovery
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Symbolic Systems
Abstract:
Web tables are a valuable source of information used in many application areas. However, to exploit Web
tables it is necessary to understand their content and intention which is impeded by their ambiguous semantics
and inconsistencies. Therefore, additional context information, e.g. text in which the tables are embedded,
is needed to support the table understanding process. In this paper, we propose a novel contextualization
approach that 1) splits the table context in topically coherent paragraphs, 2) provides a similarity measure
that is able to match each paragraph to the table in question and 3) ranks these paragraphs according to their
relevance. Each step is accompanied by an experimental evaluation on real-world data showing that our
approach is feasible and effectively identifies the most relevant context for a given Web table.