overall results. As for the latter, we could contemplate just simpler extensions,
helping, for instance, to determine the correct word sense. These could substitute the
word attributes used here.
Alternatively, we could try to categorize the given words (or groups of words) into
semantic categories. We could continue these extensions by trying to establish
relationships between different categories (e.g. relate an object of an action with some
agent and the action itself). This could give rise to new attributes (or predicates) that
could be exploited in learning.
From a practical standpoint it would be useful to exploit, in the first place,
semantic networks like WordNet for the Portuguese language. It would also be useful
to make use of existing ontologies developed by others. These could be exploited to
obtain generalizations not only for the focus element, but also in the right and left
contexts.
Despite the fact that many improvements could be done, our work shows that even
a relatively simple system could already be useful to carry out rather complex
extraction tasks.
Acknowledgements
The authors wish to acknowledge the support provided by FCT (Fundação para a
Ciência e Tecnologia) under so called Pluriannual Programme attributed to LIACC,
and the funding received from POSI/POCTI.
References
1. Shadbolt N., Caught up in the web, Invited talk at the 13th Int. Conf. on Knowledge
Engineering and Knowledge Manangement (EKWA02) (2002)
2. Kushmerick N., Wrapper induction: efficiency and expressiveness, Elsevier (2000), 15-68
3. Sitter A., W. Daelemans, Information Extraction via Double Classification, in Proceedings
of the Int. Workshop on Adaptive Text Extraction and Mining (C.Ciravegna and
N.Kushmerick, eds.), associated with ECML/PKDD-2003 Conf., Dubrovnik, Croatia,
(2003)
4. Mladenic D., M. Grobelnik, Feature selection for unbalanced class distribution and Naive
Bayes, in Machine Learning: Proceedings of the Sixtheenth International Conference
(ICML'99), Morgan Kaufmann (1999)
5. Mitchell T. M., Machine Learning, McGraw-Hill (1997)
6. Witten Ian, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with
Java Implementations, Morgan Kaufmann (2000)
7. Craven M., D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slatery,
Learning to construct knowledge bases from the World Wide Web, Elsevier (2000), 69-113
8. Winston P. H., Artificial Intelligence, Addison-Wesley (1992)
9. Quinlan J. R., C5.0 Data Mining Tool,
www.rulequest.com (1997)
10. Meadow T. Charles, B. R. Boyce, D.H. Kraft, Text Information Retrieval Systems, 2nd ed.,
Academic Press (2000)
138