loading
Papers

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Barkol Omer ; Bergman Ruth and Golan Shahar

Affiliation: HP Labs, Israel

ISBN: 978-989-8425-79-9

Keyword(s): Frequent trees, Tree edit distance, RTDM, DOM, Web mining, Web data records.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Information Extraction ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Soft Computing ; Symbolic Systems ; Web Mining

Abstract: The importance of recognizing repeating structures in web applications has generated a large body of work on algorithms for mining the HTML Document Object Model (DOM). A restricted tree edit distance metric, called the Restricted Top Down Metric (RTDM), is most suitable for DOM mining as well as less computationally expensive than the general tree edit distance. Given two trees with input size n1 and n2, the current methods take time O(n1 · n2) to compute RTDM. Consider, however, looking for patterns that form subtrees within a web page with n elements. The RTDM must be computed for all subtrees, and the running time becomes O(n4). This paper proposes a new algorithm which computes the distance between all the subtrees in a tree in time O(n2), which enables us to obtain better quality as well as better performance, on a DOM mining task. In addition, we propose a new tree edit-distance—SiSTeR (Similar Sibling Trees aware RTDM). This variant of RTDMallows considering the case were repe titious (very similar) subtrees of different quantity appear in two trees which are supposed to be considered as similar. (More)

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.208.159.25

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Omer, B.; Ruth, B. and Shahar, G. (2011). A NEW FREQUENT SIMILAR TREE ALGORITHM MOTIVATED BY DOM MINING - Using RTDM and its New Variant — SiSTeR.In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 230-235. DOI: 10.5220/0003658102380243

@conference{kdir11,
author={Barkol Omer. and Bergman Ruth. and Golan Shahar.},
title={A NEW FREQUENT SIMILAR TREE ALGORITHM MOTIVATED BY DOM MINING - Using RTDM and its New Variant — SiSTeR},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={230-235},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003658102380243},
isbn={978-989-8425-79-9},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - A NEW FREQUENT SIMILAR TREE ALGORITHM MOTIVATED BY DOM MINING - Using RTDM and its New Variant — SiSTeR
SN - 978-989-8425-79-9
AU - Omer, B.
AU - Ruth, B.
AU - Shahar, G.
PY - 2011
SP - 230
EP - 235
DO - 10.5220/0003658102380243

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.