loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Yuxin Wang and Keizo Oyama

Affiliation: National Institute of Informatics, Japan

Keyword(s): Surrounding page group, three-way classification, recall and precision, quality assurance.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Data Engineering ; Digital Libraries ; Knowledge Discovery and Information Retrieval ; Knowledge Management and Information Sharing ; Knowledge-Based Systems ; Ontologies and the Semantic Web ; Soft Computing ; Symbolic Systems ; Web Information Systems and Technologies ; Web Interfaces and Applications ; Web Mining

Abstract: We propose a web page classification method for creating a high quality homepage collection considering page group structure. We use support vector machine (SVM) with textual features obtained from each page and its surrounding pages. The surrounding pages are grouped according to connection type (in-link, outlink, and directory entry) and relative URL hierarchy (same, upper, or lower); then an independent feature subset is generated from each group. Feature subsets are further concatenated to compose the feature set of a classifier. The experiment results using ResJ-01 data set manually created by the authors and WebKB data set show the effectiveness of the proposed features compared with a baseline and some prior works. By tuning the classifiers, we then build a three-way classifier using a recall-assured and a precision-assured classifier in combination to accurately select the pages that need manual assessment to assure the required quality. It is also shown to be effect ive for reducing the amount of manual assessment. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.220.106.241

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Wang, Y. and Oyama, K. (2007). WEB PAGE CLASSIFICATION CONSIDERING PAGE GROUP STRUCTURE FOR BUILDING A HIGH-QUALITY HOMEPAGE COLLECTION. In Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 1: WEBIST; ISBN 978-972-8865-78-8; ISSN 2184-3252, SciTePress, pages 170-175. DOI: 10.5220/0001271701700175

@conference{webist07,
author={Yuxin Wang. and Keizo Oyama.},
title={WEB PAGE CLASSIFICATION CONSIDERING PAGE GROUP STRUCTURE FOR BUILDING A HIGH-QUALITY HOMEPAGE COLLECTION},
booktitle={Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 1: WEBIST},
year={2007},
pages={170-175},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001271701700175},
isbn={978-972-8865-78-8},
issn={2184-3252},
}

TY - CONF

JO - Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 1: WEBIST
TI - WEB PAGE CLASSIFICATION CONSIDERING PAGE GROUP STRUCTURE FOR BUILDING A HIGH-QUALITY HOMEPAGE COLLECTION
SN - 978-972-8865-78-8
IS - 2184-3252
AU - Wang, Y.
AU - Oyama, K.
PY - 2007
SP - 170
EP - 175
DO - 10.5220/0001271701700175
PB - SciTePress