A PATENT RETRIEVAL METHOD USING SEMANTIC ANNOTATIONS

Youngho Kim, Jihee Ryu, Sung-Hyon Myaeng

Abstract

Automatic annotation of key phrases for their semantic categories can help improving effectiveness of a variety of text-based systems including information retrieval, summarization, question answering, etc. In this paper, we exploit semantic annotations for patent retrieval (i.e., patent invalidity search). We first annotated key phrases for two semantic categories, PROBLEM (e.g. “pattern matching”) and SOLUTION (e.g. “dynamic programming”) in a patent document, which constitute a particular technology. Semantic clusters are formed by grouping patent documents with the same PROBLEM or SOLUTION tag. A language modelling approach to information retrieval is extended to consider the semantically oriented clusters as well as document models. Our retrieval evaluation of the proposed approach using a collection of United States patent documents shows a 22% improvement over the baseline, a smoothed language modelling approach without using the semantic annotations.

References

  1. Ahmad, K., Al-Thubaity, A. 2003. Can text analysis tell us something about technology progress? In Proceedings of the ACL-03 workshop on patent corpus processing, pages 41-45.
  2. Callan, J., Ku, Z., Croft, B. 1995. Searching distributed collections with inference networks. In Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 7895), pages 21-28.
  3. Chen, L., Tokuda, N., Adachi, H. 2003. A patent document retrieval system addressing both semantic and syntactic properties. In Proceedings of the ACL-03 workshop on patent corpus processing, pages 1-6.
  4. Chen, L., Tokuda, N. 2003. Robustness of regional matching scheme over global matching scheme. Artificial Intelligence, Vol. 144(1-2), pages 213-232.
  5. Croft, B. 1980. A model of cluster searching based on classification. Information Systems, Vol. 5, pages 189- 195.
  6. Fujii, A., Iwayama, M., Kando, N. 2004. Overview of patent retrieval task at NTCIR-4. In Proceedings of NTCIR-4 Workshop Meeting, pages 225-232.
  7. Fujii, A., Iwayama, M., Kando, N. 2007a. Introduction to the special issue on patent processing. Information Processing & Management, Vol .43 (5), pages 1149- 1153.
  8. Fujii, A., Iwayama, M., Kando, N. 2007b. Overview of the patent retrieval task at the NTCIR-6 workshop. In Proceedings of NTCIR-6 Workshop Meeting, pages 359-365.
  9. Fujii, A. 2007. Integrating content and citation information for the NTCIR-6 patent retrieval task. In Proceedings of NTCIR-6 Workshop Meeting, pages 377-380.
  10. Iwayama, M., Fujii, A., Kando, N., Marukawa, Y. 2003. An empirical study on retrieval models for different document genres: patents and newspaper articles. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 7803), pages 251-258.
  11. Itoh , H., Mano, H., Ogawa, Y. 2003. Term distillation in patent retrieval. In Proceedings of the ACL-03 workshop on patent corpus processing, pages 41-45.
  12. Kando, N. 2004. Overview of the Forth NTCIR Workshop. In Proceedings of 4th NTCIR Evaluation Workshop, pages 1-9.
  13. Kando, N. 2005. Overview of the Fifth NTCIR Workshop. In Proceedings of 5th NTCIR Evaluation Workshop, pages 1-9.
  14. Kando, N. 2007. Overview of the Sixth NTCIR Workshop. In Proceedings of 6th NTCIR Evaluation Workshop, pages 1-9.
  15. Kang, I-S., Na, S-H., Kim, J. 2007. Cluster-based patent retrieval. Information Processing & Management, Vol .43 (5) pages 1173-1182.
  16. Kim, Y., Tian, Y., Jeong, Y., Ryu, J., Myaeng, S-H. 2009. Automatic discovery of technology trends from patent text. In Proceedings of the 24th Symposium on Applied Computing (SAC 7809). pages. 1480-1487.
  17. Klein, D., Manning, C. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 7803) pages 423-430.
  18. Konishi, K., Kitauchi, A., Takaki, T. 2004. Invalidity patent search system of NTT data. In Proceedings of NTCIR-4 Workshop Meeting, pages 250-255.
  19. Larkey, L. 1999. A patent search and classification system. In Proceedings of the 4th ACM Conference on Digital Libraries, pages 179-187.
  20. Liu, X., Croft, B. 2004. Cluster-based retrieval using language models. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 7804), pages 186-193.
  21. Ponte, J. M., Croft, W. B. 1998. A language modelling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 7898) pages 275-281.
  22. Tait, J. 2008. Proceeding of the 1st ACM workshop on patent information retrieval, Publication dept. ACM, Inc. Denver, MA, US.
  23. Takaki, T., Fujii, A., Ishikawa, T. 2004. Associative document retrieval by query subtopic analysis and its application to invalidity patent search. In Proceedings of the 13th ACM international conference on Information and Knowledge Management (CIKM 7804) pages 399-406.
  24. Takeuchi, K., Collier, N. 2003. Biomedical entity extraction using Support Vector Machines. In Proceedings of the ACL-03 workshop on natural language processing in biomedicine pages 57-64.
  25. van Rijsbergen, C. J. 1979. Information retrieval. Newton, MA: Butterworth-Heinemann.
  26. Shinmori, A., Okumura, M., Marukawa, Y., Iwayama, M. 2003. Patent claim processing for readability: structure analysis and term explanation. In Proceedings of the ACL-03 workshop on patent corpus processing, pages 56-65.
  27. Voorhees, E. 1985. The cluster hypothesis revisited. In Proceedings of the 8th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 7885), pages 188-196.
  28. Yoon, B., Park, Y. 2004. A text mining-based patent network: analytical tool for high-technology trend. Journal of High Technology Management Research, Vol. 15 (1), pages 37-50.
  29. Zhai, C., Lafferty, J. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 7801) pages 334-342.
Download


Paper Citation


in Harvard Style

Kim Y., Ryu J. and Myaeng S. (2009). A PATENT RETRIEVAL METHOD USING SEMANTIC ANNOTATIONS . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009) ISBN 978-989-674-011-5, pages 211-218. DOI: 10.5220/0002310302110218


in Bibtex Style

@conference{kdir09,
author={Youngho Kim and Jihee Ryu and Sung-Hyon Myaeng},
title={A PATENT RETRIEVAL METHOD USING SEMANTIC ANNOTATIONS},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)},
year={2009},
pages={211-218},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002310302110218},
isbn={978-989-674-011-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)
TI - A PATENT RETRIEVAL METHOD USING SEMANTIC ANNOTATIONS
SN - 978-989-674-011-5
AU - Kim Y.
AU - Ryu J.
AU - Myaeng S.
PY - 2009
SP - 211
EP - 218
DO - 10.5220/0002310302110218