A Supervised Multi-class Multi-label Word Embeddings Approach for Toxic Comment Classification

Salvatore Carta; Andrea Corriga; Riccardo Mulas; Diego Reforgiato Recupero; Roberto Saia

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

A Supervised Multi-class Multi-label Word Embeddings Approach for Toxic Comment Classification

Topics: Concept Mining; Context Discovery; Data Analytics; Information Extraction; Mining Text and Semi-Structured Data

In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 0IC3K, 105-112, 2019 , Vienna, Austria

Authors: Salvatore Carta ; Andrea Corriga ; Riccardo Mulas ; Diego Reforgiato Recupero and Roberto Saia

Affiliation: Department of Mathematics and Computer Science, University of Cagliari, Via Ospedale 72, 09124 Cagliari and Italy

Keyword(s): Apache Spark, Word Embeddings, Sentiment Analysis, Supervised Approach.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Business Analytics ; Concept Mining ; Context Discovery ; Data Analytics ; Data Engineering ; Information Extraction ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Mining Text and Semi-Structured Data ; Symbolic Systems

Abstract: Nowadays, communications made by using the modern Internet-based opportunities have revolutionized the way people exchange information, allowing real-time discussions among a huge number of users. However, the advantages offered by such powerful instruments of communication are sometimes jeopardized by the dangers related to personal attacks that lead many people to leave a discussion that they were participating. Such a problem is related to the so-called toxic comments, i.e., personal attacks, verbal bullying and, more generally, an aggressive way in which many people participate in a discussion, which brings some participants to abandon it. By exploiting the Apache Spark big data framework and several word embeddings, this paper presents an approach able to operate a multi-class multi-label classification of a discussion within a range of six classes of toxicity. We evaluate such an approach by classifying a dataset of comments taken from the Wikipedia’s talk page, according to a Kaggle challenge. The experimental results prove that, through the adoption of different sets of word embeddings, our supervised approach outperforms the state-of-the-art that operate by exploiting the canonical bag-of-word model. In addition, the adoption of a word embeddings defined in a similar scenario (i.e., discussions related to e-learning videos), proves that it is possible to improve the performance with respect to solutions employing state-of-the-art word embeddings. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 52.14.240.178

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Carta, S.; Corriga, A.; Mulas, R.; Recupero, D. and Saia, R. (2019). A Supervised Multi-class Multi-label Word Embeddings Approach for Toxic Comment Classification. In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - KDIR; ISBN 978-989-758-382-7; ISSN 2184-3228, SciTePress, pages 105-112. DOI: 10.5220/0008110901050112

@conference{kdir19,
author={Salvatore Carta. and Andrea Corriga. and Riccardo Mulas. and Diego Reforgiato Recupero. and Roberto Saia.},
title={A Supervised Multi-class Multi-label Word Embeddings Approach for Toxic Comment Classification},
booktitle={Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - KDIR},
year={2019},
pages={105-112},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008110901050112},
isbn={978-989-758-382-7},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - KDIR
TI - A Supervised Multi-class Multi-label Word Embeddings Approach for Toxic Comment Classification
SN - 978-989-758-382-7
IS - 2184-3228
AU - Carta, S.
AU - Corriga, A.
AU - Mulas, R.
AU - Recupero, D.
AU - Saia, R.
PY - 2019
SP - 105
EP - 112
DO - 10.5220/0008110901050112
PB - SciTePress