Additionally,  there  are  a  large  number  of 
impurity  marker  words  in  most  of  the  crawled  web 
pages  such  as  "answer",  "advertisement", 
"consultation".  The  sentences  with  these  impurity 
marker words should not be included in the 
information extraction process. 
3  SENTENCES OF SYMPTOMS 
3.1  Semantic Element Sets 
The sentences that describe the causes of symptoms 
have their fixed patterns, and the patterns have their 
fixed semantic elements. 
In  order  to  summarize  a  sentences,  sentence 
patterns can be found, and the semantic elements in 
the sentence patterns can be obtained too. The entity 
words  can  roughly  be  divided  into  two  types  or 
semantic  elements,  namely  symptoms  and  causes, 
and  the  relationship  between  them  is  relational 
words.  As  shown  in  Figure  3,  we  constructed  the 
(symptom, diagnosis, cause) triplet. 
 
Figure 3: (symptom, diagnosis, cause) Triplet. 
For  example,  the  sentence  "  (Weakness  in  both 
legs is  caused by  osteoporosis)"."(weakness in  both 
legs)"," (osteoporosis)"are entity words. " (is caused 
by)"is  a  relation  word.  After  studying  a  large 
number of sentences that describe the symptoms and 
their  causes,  Some  semantic  elements  are  induced, 
and below gives some examples. 
  Concrete  causes  of  symptoms= 
{osteoporosis,amoebic dysentery,...}. 
  Upper  concepts  of  concrete  causes={causes, 
factors, reasons, …}. 
  Preposition words={because, by, since, due to, 
with,...}. 
  Relation  words={cause,  induce,    bring  out, 
form,...}. 
  Patients={patients, invalid, sick,...}. 
  List  item={one,two,three,1,2,(1),(2),1), ① , ②
,follows,...}. 
  Punctuation marks or words that embody peer 
or  parallel  meaning{comma,  or,  and,  in 
addition, also,...}. 
  Adverbs={will,  often,  generally,  more,  can, 
very, also can, possibly,...}. 
  Impurity  words{  Question,  choice,  multiple 
choice, single choice,  answer, advertisement, 
consultation,…}. 
3.2  Sentence Structure 
Some sentence patterns are summarized from a large 
number  of  web  texts.  Below  are  some  examples. 
Every  pattern  is  on  a  separate  line  and  a  example 
follows on the below line.  
  A+B1+C+B2:A(polyuria)B1(by)C(diabetes)B
2(caused). 
  C+B2+A:C(diabetes)B2(bring  out)  A 
(polyuria). 
  C+  X+B2  +  A  +  S:C(diabetes)  X  (is) 
B2(bringout) A (polyuria)S(factor). 
  (B2+)A+S+X+C:B2  (bringout)  A(polyuria) 
S(reason) X(is) C(diabetes). 
  C+P+B2+A:C(diabetes)P(patient)B2(fell)A(th
irsty). 
There  will  be  more  than  one  cause  after 
(factor)S, and only part of the cause can be obtained 
with  a  single  sentence  pattern.  Therefore,  when 
constructing  sentence  pattern  rules  after  the 
completion  of  clauses,  the  semantic  elements  after 
(factor) S cannot be classified as a entity  word,  the 
different  causes  need  to  be  distinguished  according 
to the punctuation marks or words that embody peer 
or parallel meaning above.   
  A+S:c1+c2+c3:A(polyuria)S(factor):c1(diabet
es), c2(prostatitis), c3 (bladder tumor). 
  c1+c2+c3+  B2+A:c1(innutrition),c2(habits 
and  customs),  c3(Poor  working  environment) 
B1 (cause) A (Swallowing pain). 
The  sentence  pattern  will  cover  most  of  the 
syntax  in  describing  symptom-cause  relations.  The 
more  perfect  and  comprehensive  the  sentence 
patterns  are,  the  higher  the  information  extraction 
recall rate is. 
After statistics, we find that the above 9 sentence 
patterns have the highest occurrence frequency, and 
the  specific  occurrence  frequency  is  shown  in  the 
table2 below. 
Table 2: Sentence rule frequency table. 
Sentence structure  Frequency 
A+B1+C+B2  385 
C+B2+A  40