loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Carlos Adriano Gonçalves 1 ; Célia Talma Gonçalves 2 ; Rui Camacho 1 and Eugénio Oliveira 1

Affiliations: 1 Faculdade de Engenharia da Universidade do Porto, Portugal ; 2 Faculdade de Engenharia da Universidade do Porto;Instituto Superior de Contabilidade e Administração,, Portugal

Abstract: The amount of information available in the MEDLINE database makes it very hard for a researcher to retrieve a reasonable amount of relevant documents using a simple query language interface. Automatic Classification of documents may be a valuable technology to help reducing the amount of documents retrieved for each query. To accomplish this process it is of capital importance to use appropriate pre-processing techniques on the data. The main goal of this study is to analyse the impact of pre-processing techniques in text Classification of MEDLINE documents. We have assessed the effect of combining different pre-processing techniques together with several classification algorithms available in the WEKA tool. Our experiments show that the application of pruning, stemming and WordNet reduces significantly the number of attributes and improves the accuracy of the results.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.221.68.196

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Adriano Gonçalves, C.; Talma Gonçalves, C.; Camacho, R. and Oliveira, E. (2010). The Impact of Pre-processing on the Classification of MEDLINE Documents. In Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems (ICEIS 2010) - PRIS; ISBN 978-989-8425-14-0, SciTePress, pages 53-61. DOI: 10.5220/0003028700530061

@conference{pris10,
author={Carlos {Adriano Gon\c{C}alves}. and Célia {Talma Gon\c{C}alves}. and Rui Camacho. and Eugénio Oliveira.},
title={The Impact of Pre-processing on the Classification of MEDLINE Documents},
booktitle={Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems (ICEIS 2010) - PRIS},
year={2010},
pages={53-61},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003028700530061},
isbn={978-989-8425-14-0},
}

TY - CONF

JO - Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems (ICEIS 2010) - PRIS
TI - The Impact of Pre-processing on the Classification of MEDLINE Documents
SN - 978-989-8425-14-0
AU - Adriano Gonçalves, C.
AU - Talma Gonçalves, C.
AU - Camacho, R.
AU - Oliveira, E.
PY - 2010
SP - 53
EP - 61
DO - 10.5220/0003028700530061
PB - SciTePress