loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Shuhua Liu and Thomas Forss

Affiliation: Arcada University of Applied Sciences, Finland

Keyword(s): Web Content Classification, Text Summarization, Topic Similarity, Sentiment Analysis, Online Safety Solutions.

Abstract: This research concerns the development of web content detection systems that will be able to automatically classify any web page into pre-defined content categories. Our work is motivated by practical experience and observations that certain categories of web pages, such as those that contain hatred and violence, are much harder to classify with good accuracy when both content and structural features are already taken into account. To further improve the performance of detection systems, we bring web sentiment features into classification models. In addition, we incorporate n-gram representation into our classification approach, based on the assumption that n-grams can capture more local context information in text, and thus could help to enhance topic similarity analysis. Different from most studies that only consider presence or frequency count of n-grams in their applications, we make use of tf-idf weighted n-grams in building the content classification models. Our result shows th at unigram based models, even though a much simpler approach, show their unique value and effectiveness in web content classification. Higher order n-gram based approaches, especially 5-gram based models that combine topic similarity features with sentiment features, bring significant improvement in precision levels for the Violence and two Racism related web categories. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.147.28.111

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Liu, S. and Forss, T. (2014). Combining N-gram based Similarity Analysis with Sentiment Analysis in Web Content Classification. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2014) - SSTM; ISBN 978-989-758-048-2; ISSN 2184-3228, SciTePress, pages 530-537. DOI: 10.5220/0005170305300537

@conference{sstm14,
author={Shuhua Liu. and Thomas Forss.},
title={Combining N-gram based Similarity Analysis with Sentiment Analysis in Web Content Classification},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2014) - SSTM},
year={2014},
pages={530-537},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005170305300537},
isbn={978-989-758-048-2},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2014) - SSTM
TI - Combining N-gram based Similarity Analysis with Sentiment Analysis in Web Content Classification
SN - 978-989-758-048-2
IS - 2184-3228
AU - Liu, S.
AU - Forss, T.
PY - 2014
SP - 530
EP - 537
DO - 10.5220/0005170305300537
PB - SciTePress