Problem Description: Facts: area (one feddan), rice
variety (sakha102), weeds type (ogizza), age of plant
(ten days), and cultivation method (baddar).
Symptoms: mild ogizza weeds.
Questions: What are the appropriate chemicals,
concentration and the rate? How and when to use?
The following is the answer of the above problem,
given by domain expert:
"Satron with concentration 50% and 2 Litre per
Feddan should be used. It should be mixed with fine
sand, after 15 days of cultivation ".
Mining these problems has several objectives.
First, patterns and relations can be discovered and
used to enhance the utilization of this valuable
resource. The discovered patterns and relations may
point to certain types of widespread problems and
pressing needs of people living in rural areas.
Consequently, decision makers could be able to take
necessary actions to tackle these pressing problems
and needs of poor communities. Second, solutions
given for similar problems, by different experts or
by the same expert at different time can be analysed
in terms of their similarities and differences.
Inconsistencies can then be resolved. Third, patterns
of problems and their solutions can be created and
used to classify new problems and provide solutions.
Fourth, outdated recommendations can be identified
and removed from the database. Fifth, users using
the problems database can locate problems that are
similar to theirs.
Section 2 is a review of related work. In section
3, a methodology for mining the problem parts is
given. Three parts can be extracted from the
problem's text. They are topic, description and
questions parts. Similar problems are clustered.
Solutions associated with each cluster are retrieved
and analysed.
Section 4 illustrates the difficulties encountered
when the clustering techniques was used as a means
for identifying similar problems. An alternative
more structured approach, based on transforming the
problems data base into structured data base using
extracted data set of features for each set of
problems before applying the data mining, is
presented. Result of experimentation with weed
control problems is discussed. Section 5 is
conclusion and future work.
2 RELATED WORK
Mining problems and their solutions, accumulated in
textual databases of help and support services is a
novel application of web mining. Previous mining
works focused in dealing with one type of
documents. For example, in opinion mining systems,
documents or reviews of customers are considered.
All opinion holders are of one type which is the
customer (Nauskawa, Yi, Bunescu, R., 2003.
Popescu, A., and Etzioni, O., 2005, Bo Pang and
Lillian Lee, 2008). In our work mining will be in
two different types of documents. Farmers' problems
documents and domain experts' solutions
documents. Furthermore, there is an association
between these two types of documents.
Data mining and text mining techniques can be
used in this application in an integrated manner. In
problem part, feature extraction, text clustering, and
text analysis techniques (Salton, G., 1989. Ayed, H.,
and K. M, 2002) are used to cluster similar problems
and to analyse the problems in terms of their
dominant features and the asked questions. Data
mining techniques (Margaret, H., 2003) are used to
discover patterns and relations among these
problems. In solution part, feature extraction, and
text analysis are used to analyse the solutions and
data mining techniques are used to discover patterns
and relations among solutions. In clusters of
problem-solution pairs, data mining techniques are
used to discover association rules (Jean Marc
Adamo, 2000) and text analysis techniques are used
to find the similarities and differences among
solutions of similar problems.
3 METHODOLOGY
Two modes of operation are considered, training
mode and test mode. In training mode, grouping
similar problems, extracting patterns/relations,
forming exemplars of similar problems, retrieving
solutions associated with each cluster of problems,
summarizing solutions and forming pairs of problem
and solution are done. In test mode, discovered
problem-solution exemplars are used to classify new
problems.
3.1 Problem Analysis
Figure 1, summarises the main steps of the
methodology as follows:
1. Pre-processing: using Arabic language stemmer
to remove affixes and stop words from problem text.
2. Feature Extraction: two approaches are
considered, simple approach that uses terms of text
as features and more sophisticated one that identifies
specific features to be extracted using compiled lists
MINING FARMERS PROBLEMS IN WEB-BASED TEXUAL DATABASE APPLICATION
415