Figure 1: Adopted architecture of the fuzzy keyword search. 
manual keyword extraction are obvious, e.g., 
including the inconformity of keyword combination 
and low efficacy, especially in the scenarios of cloud 
computing. Many research can be found for the 
automatic keyword extraction, e.g., (Witten, 2009) 
adopts statistical methods based on English 
dictionary to build a KEA system for automatic 
keyword extraction, and (Yang, 2002) uses PAT-tree 
structure to auto-collect keywords, while an 
improved scheme based on the co-occurrence 
frequency of Chinese phases is reported in (Du, 
2011). However, few studies are done particularly 
for automatic keyword extraction over Chinese 
patent documents.  
In this paper, for increasing the accuracy of 
keyword extraction, we overall consider the 
influence of the word frequency in the special 
regions, the penalty function of parallel structure and 
the weighted lexical morphemes upon the subjects of 
Chinese patent literature. After removing the 
common words, Chinese keywords are automatically 
extracted based on an improved method of the term 
frequency-inverse document frequency (TF-IDF) 
algorithm. To efficiently search Chinese keywords, a 
Pinyin-Gram-based algorithm is proposed to build 
the fuzzy keyword set, since Chinese Pinyin offers a 
unique method to study the Chinese word similarity, 
which is substantially different from English. 
Encrypted Files and keyword sets are transferred to 
the private cloud server. From the side of authorized 
users, a keyword trapdoor search index structure 
based on the n-ary tree is designed, and the searched 
encrypted files are outsourced by the public server, 
which usually has much more memories than the 
private server. The efficiency of the proposed 
scheme is verified through computer experiments, 
which is significantly higher than the traditional 
methods. 
2 SYSTEM DESIGN AND TASK 
2.1 System Description 
In this paper, the adopted system architecture is 
consist of four components, i.e., the owner, the 
private cloud server, the public cloud server and the 
authorized users as indicated in Fig. 1. The 
difference compared to general system architecture, 
e.g., in (Li, 2010), lies in that a private cloud server 
is introduced. The advantages of such arrangement is 
to doubly enhance the security of sensitive files, 
since information leakage may happen through the 
index analysis if all the data are stored together in 
the public server. 
The flow of the fuzzy keyword search is depicted 
as the follows. The keywords are extracted 
automatically from the patent files, and then the 
fuzzy keyword sets and search index are constructed. 
Patent files are encrypted and transferred to the 
private server by the owner.  These encrypted files 
are uploaded to the public cloud server with 
necessary remarks or extra encryption. The 
authorized users deliver the search request and the 
responding trapdoor functions are processed at the 
private server. The file indexes and the found 
encrypted files are outsourced to the user. Besides 
these features, the encryption of certain patent 
literature, e.g., the national defense patents, are 
desired for cloud computing. As implied by (Li, 
2010), the cloud server cannot be fully trusted. On 
one hand, it does not delete the encrypted files and 
the index, and only response to the query requests 
from authorized authors with unchanged search 
results. On the other hand, it may analyze the data 
stored in the server for certain purposes and sell the 
analyzed results as additional information to