loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: João Cordeiro 1 and Pavel Brazdil 2

Affiliations: 1 University of Beira Interior; Artificial Intelligence and Computer Science Laboratory (LIACC), U. Porto, Portugal ; 2 Faculty of Economics of University of Porto; Artificial Intelligence and Computer Science Laboratory (LIACC), U. Porto, Portugal

Abstract: Information Extraction (IE) from text /web documents has become an important application area of AI. As the number of web sites and documents has grown dramatically, the users need an easy, fast and flexible ways of generating systems that can carry out specific IE tasks. This can be achieved with the help of Machine Learning (ML) techniques. We have developed a system that exploits this strategy. After training the system is capable of identifying certain relevant elements in the text and extracting the corresponding information. As input, system takes a collection of text documents (in a certain domain), that have been previously annotated by a user. This is used to generate extraction rules. We describe a set of experiments that have been oriented towards the domain of announcements (in Portuguese) concerning house/flat sales. We show that quite good results overall can be achieved using this methodology. In previous work some authors argue that stop words should really be elimina ted before training. We have decided to re-examine this assumption and present evidence that these can be quite useful in some sub-tasks. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.118.37.85

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Cordeiro, J. and Brazdil, P. (2004). Learning Text Extraction Rules, without Ignoring Stop Words. In Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems (ICEIS 2004) - PRIS; ISBN 972-8865-01-5, SciTePress, pages 128-138. DOI: 10.5220/0002681601280138

@conference{pris04,
author={João Cordeiro. and Pavel Brazdil.},
title={Learning Text Extraction Rules, without Ignoring Stop Words},
booktitle={Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems (ICEIS 2004) - PRIS},
year={2004},
pages={128-138},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002681601280138},
isbn={972-8865-01-5},
}

TY - CONF

JO - Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems (ICEIS 2004) - PRIS
TI - Learning Text Extraction Rules, without Ignoring Stop Words
SN - 972-8865-01-5
AU - Cordeiro, J.
AU - Brazdil, P.
PY - 2004
SP - 128
EP - 138
DO - 10.5220/0002681601280138
PB - SciTePress