IDENTIFYING DOMAIN-SPECIFIC SENSES AND ITS APPLICATION TO TEXT CLASSIFICATION

Fumiyo Fukumoto, Yoshimi Suzuki

2010

Abstract

This paper focuses on domain-specific senses and presents a method for identifying predominant sense depending on each domain. The method consists of two steps: selecting senses by text classification and scoring senses by link analysis. Sense selection is to identify each sense of a word to the corresponding domain. We used a text classification technique. Senses were scored by computing the rank scores using the Markov Random Walk (MRW) model. The method was tested on WordNet 3.0 and the Reuters corpus. For evaluation of the method, we compared the results with the Subject Field Codes resources, which annotate WordNet 2.0 synsets with domain labels. Moreover, we applied the results to text classification. The results demonstrated the effectiveness of the method.

References

  1. Agirre, E. and Lacalle, O. L. (2009). Supervised Domain Adaption for WSD. In Proc. of the EACL, pages 42- 50.
  2. Agirre, E. and Soroa, A. (2009). Personalizing PageRank for Word Sense Disambiguation. In Proc. of the EACL, pages 33-41.
  3. Bremaud, P. (1999). Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer-Verlag.
  4. Brin, S. and Page, L. (1998). The Anatomy of a Largescale Hypertextual Web Search Engine. Computer Networks, 30(1-7):107-117.
  5. Buitelaar, P. and Sacaleanu, B. (2001). Ranking and Selecting Synsets by Domain Relevance. In Proc. of WordNet and Other Lexical Resources: Applications, Extensions and Customization, pages 119-124.
  6. Chan, Y. S. and Ng, H. T. (2007). Domain Adaptation with Active Learning for Word Sense Disambiguation. In Proc. of the ACL, pages 49-56.
  7. Ciaramita, M. and Johnson, M. (2003). Supersense Tagging of Unknown Nouns in WordNet. In Proc. of the EMNLP2003, pages 168-175.
  8. Cotton, S., Edmonds, P., Palmer, M. (1998). http://www.sle.sharp.co.uk/senseval2/.
  9. Lafferty, J. and Zhai, C. (2001). Document Language Modeling, Query Models, and Risk Minimization for Information Retrieval. In Proc. of the SIGIR2001, pages 111-119.
  10. Magnini, B. and Cavaglia, G. (2000). Integrating Subject Field Codes into WordNet. In In Proc. of LREC-2000.
  11. McCarthy, D., Koeling, R., Weeds, J., and Carroll, J. (2004). Finding Predominant Senses in Untagged Text. In Proc. of the ACL, pages 280-287.
  12. Mihalcea, R. and Tarau, P. (2005). Language Independent Extractive Summarization. In In Proc. of the ACL, pages 49-52.
  13. Miller, G. A., Leacock, C., Tengi, R., and Bunker, R. T. (1993). A Semantic Concordance. In Proc. of the ARPA Workshop on HLT, pages 303-308.
  14. Navigli, R. and Lapata, M. (2010). An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation. IEEE Trans. Pattern Anal. Mach. Intell., 32(4):678-692.
  15. Netlib (2007). http://www.netlib.org/scalapack/index.html. In Netlib Repository at UTK and ORNL.
  16. Nigam, K., McCallum, A. K., Thrun, S., and Mitchell, T. (2000). Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning, 39(2):103-134.
  17. Ramage, D., Rafferty, A. N., and Manning, C. D. (2009). Random Walks for Text Semantic Similarity. In Proc. of the 4th TextGraphs Workshop on Graph-based Algorithms in NLP, pages 23-31.
  18. Rose, T. G., Stevenson, M., and Whitehead, M. (2002). The Reuters Corpus Volume 1 - from yesterday's news to tomorrow's language resources. In Proc. of LREC2002.
  19. Schmid, H. (1995). Improvements in Part-of-Speech Tagging with an Application to German. In Proc. of the EACL SIGDAT Workshop.
Download


Paper Citation


in Harvard Style

Fukumoto F. and Suzuki Y. (2010). IDENTIFYING DOMAIN-SPECIFIC SENSES AND ITS APPLICATION TO TEXT CLASSIFICATION . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2010) ISBN 978-989-8425-29-4, pages 263-268. DOI: 10.5220/0003094102630268


in Bibtex Style

@conference{keod10,
author={Fumiyo Fukumoto and Yoshimi Suzuki},
title={IDENTIFYING DOMAIN-SPECIFIC SENSES AND ITS APPLICATION TO TEXT CLASSIFICATION},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2010)},
year={2010},
pages={263-268},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003094102630268},
isbn={978-989-8425-29-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2010)
TI - IDENTIFYING DOMAIN-SPECIFIC SENSES AND ITS APPLICATION TO TEXT CLASSIFICATION
SN - 978-989-8425-29-4
AU - Fukumoto F.
AU - Suzuki Y.
PY - 2010
SP - 263
EP - 268
DO - 10.5220/0003094102630268