Chinese-keyword Fuzzy Search and Extraction over Encrypted Patent Documents

Wei Ding, Yongji Liu, Jianfeng Zhang

Abstract

Cloud storage for information sharing is likely indispensable to the future national defence library in China e.g., for searching national defence patent documents, while security risks need to be maximally avoided using data encryption. Patent keywords are the high-level summary of the patent document, and it is significant in practice to efficiently extract and search the key words in the patent documents. Due to the particularity of Chinese keywords, most existing algorithms in English language environment become ineffective in Chinese scenarios. For extracting the keywords from patent documents, the manual keyword extraction is inappropriate when the amount of files is large. An improved method based on the term frequency–inverse document frequency (TF-IDF) is proposed to auto-extract the keywords in the patent literature. The extracted keyword sets also help to accelerate the keyword search by linking finite keywords with a large amount of documents. Fuzzy keyword search is introduced to further increase the search efficiency in the cloud computing scenarios compared to exact keyword search methods. Based on the Chinese Pinyin similarity, a Pinyin-Gram-based algorithm is proposed for fuzzy search in encrypted Chinese environment, and a keyword trapdoor search index structure based on the n-ary tree is designed. Both the search efficiency and accuracy of the proposed scheme are verified through computer experiments.

References

  1. Weiss, A., 2007. Computing in the Clouds. netWorker.
  2. Li, J., et al, 2010. Fuzzy keyword search over encrypted data in cloud computing, Proceedings of IEEE.
  3. Liu, Q., et al, 2009. An Efficient Privacy Preserving Keyword Search Scheme in Cloud Computing, IEEE Int. Sym. on Trusted Computing and Communications.
  4. Liu, Q., et al, 2012. Secure and Privacy Preserving Keyword Search for Cloud Storage, Journal of Network and Computer Applications.
  5. Boneh, D., et al, 2004. Public Key Encryption with Keyword Search, Int. Conf. on Theory and Applications of Crypto-graphic Technique.
  6. Chang, Y., et al, 2005. Privacy Preserving Keyword Searches on Remote Encrypted Data, Applied Cryptography and Network Security.
  7. Song, D., et al, 2000. Practical techniques for searches on encrypted data, IEEE Sym. on Security and Privacy.
  8. Goh, E.-J., 2003. Secure indexes, Cryptology ePrint Archive Report.
  9. Chor, B., et al, 1995. Private information retrieval, Annual Sym. on Foundations of Computer Science.
  10. Ji, S., et al, 2009. Efficient interactive fuzzy keyword search, VLDB Journal.
  11. Cao, J., et al, 2009. A Pinyin indexed method for approximate matching in Chinese, CMMSC'2009.
  12. Witten, I., et al, 1999. KEA: Practical Automatic Keyphrase Extraction, ACM Confrence on Digital Libraries.
  13. Yang, W., et al, 2002. Chinese Keyword Extraction based on Max-duplicated Strings of the Document, ACM SIGIR Conf. on Research and Development in Information Retrieval.
  14. Du, Y., et al, 2011. Automatic extraction of keyword based on word co-occurrence frequency, Journal of Beijing Institute of Machinery.
  15. Shi, E., et al, 2007. Multidimensional range query for encrypted data, IEEE Symposium on Security and Privacy.
  16. Zerr, S., et al, 2008. r-Confidential indexing for distributed documents, IEEE Symposium on Security and Privacy.
  17. Wang, J., et al, 2007. An Approximate String Matching Algorithm for Chinese Information Retrieval Systems, Journal of Chinese Information Processing.
  18. Bellare, M., et al, 1997. HMAC: Keyed-hashing for message authentication, Internet Request for Comment RFC.
  19. Ma, X., 2005. Analysis of Chinese homonym, Contents of Major Papers.
  20. Li, Q., et al, 2012. Efficient Multi-keyword research over Secure Cloud Storage, Computer Science.
Download


Paper Citation


in Harvard Style

Ding W., Liu Y. and Zhang J. (2015). Chinese-keyword Fuzzy Search and Extraction over Encrypted Patent Documents . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 168-176. DOI: 10.5220/0005581001680176


in Bibtex Style

@conference{kdir15,
author={Wei Ding and Yongji Liu and Jianfeng Zhang},
title={Chinese-keyword Fuzzy Search and Extraction over Encrypted Patent Documents},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={168-176},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005581001680176},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - Chinese-keyword Fuzzy Search and Extraction over Encrypted Patent Documents
SN - 978-989-758-158-8
AU - Ding W.
AU - Liu Y.
AU - Zhang J.
PY - 2015
SP - 168
EP - 176
DO - 10.5220/0005581001680176