loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Julia Hodges ; Yong Wang and Bo Tang

Affiliation: Mississippi State University, United States

Abstract: WebDoc is an automated classification system that assigns Web documents to appropriate Library of Congress subject headings based upon the text in the documents. We have used different classification methods in different versions of WebDoc. One classification method is a statistical approach that counts the number of occurrences of a given noun phrase in documents assigned to a particular subject heading as the basis for determining the weights to be assigned to the candidate indexes. The second classification method uses a naïve Bayes approach. In this case, we experimented with the use of smoothing to dampen the effect of having a large number of 0s in our feature vectors. The third classification method is a k-nearest neighbors approach. With this approach, we tested two different ways of determining the similarity of feature vectors. In this paper, we report the performance of each of the versions of WebDoc in terms of recall, precision, and F-measures.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.191.200.47

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Hodges, J.; Wang, Y. and Tang, B. (2005). A Comparison of Methods for Web Document Classification. In Proceedings of the 5th International Workshop on Pattern Recognition in Information Systems (ICEIS 2005) - PRIS; ISBN 972-8865-28-7, SciTePress, pages 154-163. DOI: 10.5220/0002557601540163

@conference{pris05,
author={Julia Hodges. and Yong Wang. and Bo Tang.},
title={A Comparison of Methods for Web Document Classification},
booktitle={Proceedings of the 5th International Workshop on Pattern Recognition in Information Systems (ICEIS 2005) - PRIS},
year={2005},
pages={154-163},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002557601540163},
isbn={972-8865-28-7},
}

TY - CONF

JO - Proceedings of the 5th International Workshop on Pattern Recognition in Information Systems (ICEIS 2005) - PRIS
TI - A Comparison of Methods for Web Document Classification
SN - 972-8865-28-7
AU - Hodges, J.
AU - Wang, Y.
AU - Tang, B.
PY - 2005
SP - 154
EP - 163
DO - 10.5220/0002557601540163
PB - SciTePress