loading
Papers

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Artur J. Ferreira 1 and Mario Figueiredo 2

Affiliations: 1 Instituto Superior de Engenharia de Lisboa;Instituto de Telecomunicações, Portugal ; 2 Instituto Superior Técnico;Instituto de Telecomunicações, Portugal

ISBN: 978-989-8425-14-0

Abstract: Text classification is an important tool for many applications, in supervised, semi-supervised, and unsupervised scenarios. In order to be processed by machine learning methods, a text (document) is usually represented as a bag-of-words (BoW). A BoW is a large vector of features (usually stored as floating point values), which represent the relative frequency of occurrence of a given word/term in each document. Typically, we have a large number of features, many of which may be non-informative for classification tasks and thus the need for feature transformation, reduction, and selection arises. In this paper, we propose two efficient algorithms for feature transformation and reduction for BoW-like representations. The proposed algorithms rely on simple statistical analysis of the input pattern, exploiting the BoW and its binary version. The algorithms are evaluated with support vector machine (SVM) and AdaBoost classifiers on standard benchmark datasets. The experimental results show the adequacy of the reduced/transformed binary features for text classification problems as well as the improvement on the test set error rate, using the proposed methods. (More)

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 35.175.191.168

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
J. Ferreira A.; Figueiredo M. and (2010). Feature Transformation and Reduction for Text Classification.In Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2010) ISBN 978-989-8425-14-0, pages 72-81. DOI: 10.5220/0003028100720081

@conference{pris10,
author={Artur {J. Ferreira} and Mario Figueiredo},
title={Feature Transformation and Reduction for Text Classification},
booktitle={Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2010)},
year={2010},
pages={72-81},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003028100720081},
isbn={978-989-8425-14-0},
}

TY - CONF

JO - Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2010)
TI - Feature Transformation and Reduction for Text Classification
SN - 978-989-8425-14-0
AU - J. Ferreira, A.
AU - Figueiredo, M.
PY - 2010
SP - 72
EP - 81
DO - 10.5220/0003028100720081

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.