2 RELATED WORK
Our research aims to extract useful product infor-
mation based on user generated information to cre-
ate a product model. This work is closely related to
feature-based opinion mining which has drawn many
researchers’ attention in recent years. In detail, iden-
tifying features that have been mentioned by users is
considered the most significant step in opinion mining
(Hai et al., 2013). Hu and Liu (2004) first proposed
a feature-based opinion mining method to extract fea-
tures and sentiments from customer reviews. They
use pattern mining to find frequent itemsets (nouns).
These itemsets are pruned and considered frequent
product features. A list of sentiment words (adjec-
tives) that are nearby frequent features in reviews
can be extracted and used to identify those product
features that cannot be identified by pattern mining.
Scaffidi et al. (2007) improved the performance of
feature extraction in their proposed system called Red
Opal. Specifically, they made use of a language model
to find features by comparing the frequency of nouns
in the review and in common use of English. Those
frequent nouns in both reviews and in common use
are considered invalid features. Hu et al. (2010)
make use of SentiWordNet to identify all sentences
that may contain users’ sentiment polarity. Then, the
pattern mining is applied to generate explicit features
based on these opinionated sentences. In addition, a
mapping database has been constructed to find those
implicit features represented by sentiment words(e.g.,
expensive indicates price). To enhance the accuracy
of finding correct features from free text review, Hai
et al (2013) proposed a novel method which evaluates
the domain relevance of a feature by exploiting fea-
tures’ distribution disparities across different corpora
(domain-dependent review corpus such as cellphone
reviews and domain-irrelevant corpus such as culture
article collection). In detail, the intrinsic-domain rel-
evance (IDR) and extrinsic-domain relevance (EDR)
have been proposed to benchmark if a examined fea-
ture is related to a certain domain. The candidate
feature with low IDR and high EDR scores will be
pruned.
Lau et al. (2009) presented an ontology-based
approach to profile the product. In detail, a number
of ontology levels, such as feature level that contains
identified features for a certain product and sentiment
level in which sentiment words that describe a certain
feature are stored, have been constructed (Lau et al.,
2009). This method provides a simple product profile
rather than extracting product features only.
The statistical topic modeling technique has been
used in various fields such as text mining (Blei et al.,
2003; Hofmann, 2001) in recent years. Latent Se-
mantic Analysis (LSA) is first proposed to capture
the most significant features of a document collec-
tion based upon semantic structure of relevant doc-
uments (Lewis, 1992). Then, Probabilistic LSA
(pLSA) (Hofmann, 2001) and Latent Dirichlet Allo-
cation (LDA) (Blei et al., 2003) are proposed to im-
prove the interpretation of results from LSA. These
techniques have been proven more effective on doc-
ument modeling and topic extraction, which are rep-
resented by topic-document and word-topic distribu-
tion, respectively. Particularly, multinomial distribu-
tion over words which is derived based upon word fre-
quency can be generated to represent topics in a given
text collection.
None of aforementioned feature identification ap-
proaches is able to identify the relationships between
the extracted product features. The structural relation-
ships that exist between features can be used to de-
scribe the reviewed product in more depth. However,
how to evaluate and determine the relations between
features is still challenging.
The remainder of the paper is organized as fol-
lows. The next section illustrates the construction
process of our proposed feature taxonomy. Then, the
evaluation of our approach is reported afterwards. Fi-
nally, we conclude and describe future direction of
our research work.
3 THE PROPOSED APPROACH
Our proposed approach consists of two main steps:
product taxonomy construction using association
rules and taxonomy expansion based on reference fea-
tures. The input of our system is a collection of user
reviews for a certain product. The output is a product
feature taxonomy which contains not only all gener-
ated features but also the relationships between them.
3.1 Pre-processing and Transaction File
Generation
First of all, we construct a single document called an
aggregated review document which combines all the
reviews in a collection of reviews, keeping each sen-
tence in the original reviews as one sentence in the
constructed aggregated review document. Three steps
are undertaken to process the review text in order to
extract useful information. Firstly, we generate the
part-of-speech (POS) tag for each word in the aggre-
gated review document to indicate whether the word
is a noun, adjective or adverb etc. For instance, af-
ter the POS tagging,“The flash is very weak.” would
ProductFeatureTaxonomyLearningbasedonUserReviews
185