Cross-domain Sentiment Classiﬁcation using an Adapted Na

ıve Bayes

Approach and Features Derived from Syntax Trees

Srilaxmi Cheeti, Ana Stanescu and Doina Caragea

Department of Computing and Information Sciences, Kansas State University, Manhattan, KS, U.S.A.

Keywords:

Domain Adaptation, Sentiment Classiﬁcation, Adapted Na¨ıve Bayes, Syntax Trees.

Abstract:

Online product reviews contain information that can assist in the decision making process of new customers

looking for various products. To assist customers, supervised learning algorithms can be used to categorize the

reviews as either positive or negative, if large amounts of labeled data are available. However, some domains

have few or no labeled instances (i.e., reviews), yet a large number of unlabeled instances. Therefore, domain

adaptation algorithms that can leverage the knowledge from a source domain to label reviews from a target do-

main are needed. We address the problem of classifying product reviews using domain adaptation algorithms,

in particular, an Adapted Na¨ıve Bayes classiﬁer, and features derived from syntax trees. Our experiments on

several cross-domain product review datasets show that this approach produces accurate domain adaptation

classiﬁers for the sentiment classiﬁcation task.

1 INTRODUCTION

Web 2.0 contains a vast amount of user generated in-

formation, in the form of reviews, blogs, webpages,

etc. Shoppers tend more and more to seek online re-

views before making a purchase. For example, users

interested in buying a camera must evaluate alterna-

tive products with various characteristics. In addition

to product descriptions, positive and negative opin-

ions from previous users can also make an impact

on the customer’s choices. Manufacturers and retail-

ers also ﬁnd such reviews helpful, as they can learn

more about customer’s likes and dislikes and adjust

the products accordingly, or use that information to

train recommender systems for suggesting products

to potential users, and targeting customers.

Manually classifying customer reviews can be an

intensive, time consuming process, as it requires a lot

of browsing and reading of reviews. Therefore, au-

tomated tools to do this classiﬁcation are desirable,

as they could save both customers and companies a

lot of time and quickly provide the gist of the re-

views about a product. Automated classiﬁcation of

online data as positive, negativeor neutral is known as

sentiment classiﬁcation, an area at the intersection of

Machine Learning (ML) and Natural Language Pro-

cessing (NLP). In supervised frameworks, the senti-

ment classiﬁcation problem is formulated as a ma-

chine learning problem, where labeled training data

is provided to a learning algorithm and a classiﬁer

is learned. The resulting classiﬁer can then predict

the sentiment of new unlabeled data. Both training

and test instances are represented using automatically

generated features, e.g., NLP features.

The sentiment classiﬁcation, in general, can be ad-

dressed at word, sentence, or document level. Much

of the previous sentiment classiﬁcation work has been

done at the document level using keyword based ap-

proaches, and there has not been a lot of work done

at the sentence level. Sentence level classiﬁcation is

more challenging when compared to document level

classiﬁcation because classiﬁcation of a sentence as

positive, negativeor neutral has to be performed in the

absence of context. This problem can be alleviated, if

two or more consecutive sentences are combined to-

gether, or if the whole document is used. Another

challenge in sentiment classiﬁcation is that a sentence

or a document can have more than one sentiment.

In this work, we focus on sentiment classiﬁca-

tion at sentence level, but consider sentences that have

only one sentiment, either positive or negative, as neu-

tral reviews are not particularly helpful in the process

of making a decision. Usually, with enough training

data, the supervised approach can produce accurate

domain-speciﬁc classiﬁers. For example, one can use

movie review data to train a movie sentiment classi-

ﬁer and then use the classiﬁer to predict the sentiment

of new movie reviews. However, in real world ap-

169

Cheeti S., Stanescu A. and Caragea D..

Cross-domain Sentiment Classiﬁcation using an Adapted Naïve Bayes Approach and Features Derived from Syntax Trees.

DOI: 10.5220/0004546501690176

In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge

Management and Information Sharing (KDIR-2013), pages 169-176

ISBN: 978-989-8565-75-4

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

plications, the amount of labeled data for a particu-

lar domain can be limited and it is interesting to con-

sider cross-domain classiﬁers, in other words, classi-

ﬁers that leverage training data from a source domain

to learn a classiﬁer for a target domain with limited

labeled data. For example, we can use books as the

source domain, while the target domain can be either

music, DVDs, movies, electronics, clothing, toys, etc.

Generally, a classiﬁer built on one domain (i.e.,

source domain) does not perform well when used to

classify the sentiment in another domain (i.e., target

domain). One reason for this is that there might be

some speciﬁc words that express the overall polarity

of a given sentence in a given domain, and the same

words can have different meaning or polarity in an-

other domain. Let us consider kitchen appliances and

cameras as our domains, then words such as good and

excellent express positive sentiments in both kitchen

appliance domain, as well as camera domain. Words

such as bad and worse express a negative sentiment in

both domains; they are known as domain independent

words. On the other hand, words such as safe, stain-

less, sturdy, efﬁcient express sentiments in the kitchen

domain and may or may not express any sentiment in

the camera domain. These are known as domain de-

pendent or domain speciﬁc words.

In cross-domain classiﬁcation, the general goal is

to use labeled data in the source domain and, possibly,

some labeled data in the target domain, together with

unlabeled data from the target to learn cross-domain

classiﬁers for predicting the sentiment of future target

instances. The cross-domain sentiment classiﬁcation

problem presents additional challenges compared to

the corresponding problem in a single domain. Us-

ing both source and target data to construct the classi-

ﬁer requires substantial insight and effort, speciﬁcally

with respect to how to choose source features that are

predictive for target, and also how to combine data or

classiﬁers from source and target.

To address the ﬁrst problem, most recent ap-

proaches (Blitzer et al., 2006), (Blitzer et al., 2007),

(Tan et al., 2009) identify domain independent fea-

tures (a.k.a., generalized or pivot features) to repre-

sent the source, and domain speciﬁc features to repre-

sent the target. Domain independent features serve as

a bridge between source and target, thus reducing the

gap between them. The performance of the ﬁnal clas-

siﬁer will heavily depend on the domain independent

features; therefore, care must be used when select-

ing these features. In this work, we use NLP syntax

structured trees to generate features. Domain inde-

pendent features are selected based on the frequently

co-occurring entropy (FCE) method proposed by Tan

et al. (2009). Features with high entropy values are

assumed to be independent features and used to rep-

resent the source domain. Furthermore, to combine

source and target data, we use an Expectation Maxi-

mization (EM) based Na¨ıve Bayes classiﬁer proposed

also by Tan et al., (2009). Originally, the approach

in (Tan et al., 2009) assumes labeled source data and

unlabeled target data. In our implementation, we can

also incorporate labeled target domain data, if avail-

able. As the number of iterations increases, we reduce

the weight for the source domain instances, while in-

creasing the weight for the target domain instances,

so that the resulting classiﬁer can ultimately be used

for predicting target domain instances.

2 RELATED WORK

Sentiment classiﬁcation across domains is a very

challenging problem. Classiﬁers trained on one do-

main cannot always predict the instances from a dif-

ferent domain accurately, due to the fact that domain-

speciﬁc features can have different meanings in dif-

ferent domains. The main challenges when perform-

ing sentiment classiﬁcation experiments consist of se-

lecting the appropriate features and the right Machine

Learning algorithms for a particular dataset.

Relevant to our work, in the context of a single

domain sentiment classiﬁcation, Harb et al., (2008)

introduced the AMOD (Automatic Mining of Opin-

ion Dictionaries) approach consisting of the following

three phases. The ﬁrst phase, known as the Corpora

Acquisition Learning Phase, solves a major challenge

by automatically extracting the data from the web us-

ing a predeﬁned set of seed words (positive and neg-

ative terms). The second phase, also known as the

Adjective Extraction Phase, extracts a list of adjec-

tive words with positive and negative opinions. The

third phase, known as the Classiﬁcation Phase is used

to classify the given documents using the adjective

words extracted in the second phase. The authors used

unigrams as AMOD features and then used the list

of adjective words to classify the given documents.

They used movie review dataset and the car dataset

and the results show that the AMOD approach was

able to classify the given documents by using a list of

adjective words in a single domain.

Zhang et al., (2010) proposed to use several types

of syntax subtrees as features, where the subtrees are

obtained from complete syntax trees by using both ad-

jective and sentiment word pruning strategies. The

syntax trees are derived using the Stanford parser.

These features were found to be very efﬁcientfor clas-

siﬁcation in a single domain scenario.

Blitzer et al., (2007) introduced a domain adap-

KDIR2013-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

170

tation strategy, which is an extension of an approach

previouslyproposed by the same authors, called struc-

tural correspondence learning (SCL) (Blitzer et al.,

2006). As a baseline, the authors ﬁrst chose a set

of features that occur frequently in both source and

target domains as pivot (or generalized) features, and

compare these features with the pivot features se-

lected as those target features that have the highest

mutual information (MI) to the source domain. First,

the authors assume that the source domain dataset

contains labeled and unlabeled data, whereas the

target domain dataset contains only unlabeled data.

They observed that choosing the pivot features us-

ing MI has reduced the relative error by 36%. Af-

ter introducing 50 labeled instances from the target

domain, they observed that the average reduction in

error is 46%. Overall, the algorithm is found to be

very useful for cross-domain sentiment classiﬁcation

especially due to the use of the MI to select the pivot

features. Furthermore, the results in this paper show

that using a small number of training labeled data can

yield improved classiﬁers.

The Spectral Feature Alignment (SFA) algorithm

for cross-domain sentiment classiﬁcation was pro-

posed by Pan et al., (2010). The process of selecting

the pivot features is the same as the one described in

(Blitzer et al., 2007). The results of the SFA algorithm

are better than SCL and NoTransf, where a classiﬁer

is trained using only source domain data.

Tan et al. (2009) proposed an Adapted Na¨ıve

Bayes (ANB) algorithm to perform cross-domain sen-

timent classiﬁcation. The ﬁrst step in their approach is

to ﬁnd generalized features that can serve as a bridge

between the source and the target domains. In order

to retrieve the generalized features they used a fre-

quently co-occurring entropy method and picked the

features with the highest entropy values as the gen-

eralized features. Subsequently, two classiﬁers are

learned, one from the source domain using only the

generalized features and the other from the target do-

main using all the features from the target domain.

Next, the classiﬁers are used to predict the target do-

main unlabeled instances. The process of learning the

classiﬁers and then using them to predict the target

domain instances is repeated until the algorithm con-

vergences. The authors used Chinese domain-speciﬁc

datasets for their experiments. They compared the

ANB algorithm with Na¨ıve Bayes Multinomial (su-

pervised), EM-based Na¨ıve Bayes (semi-supervised)

described in (Nigam et al., 1998), Na¨ıve Bayes Trans-

fer Classiﬁer (transfer-learning) described in (Dai

et al., 2007). The results show that ANB performs

much better than the other algorithms.

3 ANB WITH SYNTAX TREE

FEATURES

Our goal is to perform sentence level sentiment classi-

ﬁcation across domains. As described earlier, Zhang

et al. (2010) have shown that features constructed

based on syntax trees can give good results for sen-

timent classiﬁcation problems in a single domain.

We have also seen that there are many algorithms

for learning cross-domain classiﬁers, including SCL

(Blitzer et al., 2006), AMOD (Harb et al., 2008), SFA

(Pan et al., 2010), and ANB (Tan et al., 2009). Pre-

vious results have shown that the features used and

the methods for selecting generalized features, for ex-

ample MI (Pan et al., 2010) or FCE (Blitzer et al.,

2006), can have a high impact on the performance

of the resulting classiﬁers. To identify generalized

features, we use the frequently co-occurring entropy

(FCE) proposed by Tan et al. (2009), thus eliminating

the need for a predeﬁned set of domain speciﬁc and

domain independent features required by other meth-

ods. We also use the Adapted Na¨ıve Bayes algorithm

(ANB) proposed by Tan et al. (2009) to learn cross-

domain classiﬁers. However, we have modiﬁed the

ANB approach to enable it to use labeled data from

the target domain during the training phase.

3.1 ANB with Target Labeled

Adapted Na¨ıve Bayes (ANB) (Tan et al., 2009) is a

domain adaptation algorithm, based on a weighted

transfer version of the Na¨ıve Bayes classiﬁer. It builds

a classiﬁer using the Expectation Maximization (EM)

technique together with the Na¨ıve Bayes classiﬁer,

to predict the target domain unlabeled data. More

speciﬁcally, the EM algorithm is used to maximize

the likelihood of the data, and consists of two steps:

E-step (Expectation-step)and M-step (Maximization-

step). In the E-step, we estimate the missing data (in

our case, the labels of the unlabeled target data) given

the current model. In the M-step, we update the model

by maximizing the likelihood function under the as-

sumption that the missing data is now known. The

two steps are repeated until convergence.

The EM approach used in ANB and described in

(Tan et al., 2009) is different from the traditional EM,

as in ANB we use both source and target data for

training, while we aim to maximize the likelihood

only with respect to the target data. Towards this goal,

ANB maintains weights for instances in both source

and target domains. At each iteration, the weights are

increased for the target-domain data, and decreased

for the source-domain data, under the assumption that

the classiﬁer continues to improvewith respect to pre-

Cross-domainSentimentClassificationusinganAdaptedNaïveBayesApproachandFeaturesDerivedfromSyntaxTrees

171

dicting target data. This behavior is controlled by a

constant lambda (λ). Furthermore, for the source do-

main we do not use domain speciﬁc features, but only

domain independent features, as only they can serve

as a bridge for transferring information from source to

target. As opposed to the source domain, for the tar-

get domain, we use the whole vocabulary, i.e., target-

speciﬁc features and domain independent features.

In the original formulation of the ANB algorithm,

the authors assume that no labeled target data is avail-

able. We modify the algorithm to allow it to make use

of labeled target data, in cases where a small amount

of such data is available. We follow the notation in

(Tan et al., 2009) to describe the ANB algorithm:

E-step:

P(c

) ∝ P(c

)

∏

v∈V

(P( f

))

v,i

M-step:

P(c

) =

(1− λ) ∗

∑

i∈D

P(c

) + λ ∗

∑

i∈D

P(c

)

(1− λ) ∗ |D

| + λ ∗ |D

P( f

) =

(1− λ) ∗



∗ N

v,k



+ λ ∗



v,k



+ 1

(1− λ)∗

|V|

∑

v=1



∗ N

v,k



+ λ ∗

|V|

∑

v=1



v,k



+ |V|

In the above formulas, N

v,k

and N

v,k

denote the num-

ber of appearances of feature f

in class c

, for source

domain (D

) and target domain (D

), respectively, and

are obtained as follows:

v,k

∑

i∈D

v,i

∗ P(c

), N

v,k

∑

i∈D

v,i

∗ P(c

)

Furthermore, λ is a parameter for controlling the

weights for the source domain versus target domain

instances. The value of λ changes with the number of

iterations (τ), which is expressed as: λ = min(δ∗ τ,1)

and τ ∈ {1,2,3,...}. Here, δ is a constant (in our work

we used δ = 0.2, similar to the value used by Tan et

al.); η

is 0 if f

/∈ V

FCE

and 1 f

∈ V

FCE

We use this algorithm under two scenarios. First,

we assume that no labeled target data is available. In

the second scenario, we assume that a small amount

of labeled target data is available (in addition to la-

beled source and unlabeled target data). The differ-

ence between the two scenarios is captured as follows:

Case 1: During the ﬁrst iteration D

= φ in the

M-step. From the second iteration onwards, D

unlab

, with labels predicted by the current model,

until convergenceis met. This case corresponds to the

original version of the algorithm (Tan et al., 2009).

Case 2: During the ﬁrst iteration D

is given by D

lab

in the above M-step. From the second iteration on-

wards D

consists of D

lab

and D

t unlab

with labels

predicted based on the current model, until we reach

a convergence point. This case captures the modiﬁca-

tion that we made to the original algorithm.

More precisely, in the ﬁrst case, we assume that

the target domain has only unlabeled data. Thus, we

ﬁrst train a classiﬁer using the source domain labeled

data and predict the correspondinglabels for the target

domain unlabeled data. Starting with the second iter-

ation, we train a classiﬁer using both source and target

data, and use the trained classiﬁer to predict labels for

the target domain unlabeled data. This process is re-

peated iteratively until we meet a convergence point,

i.e., until we have the same labels for the target do-

main unlabeled instances for two consecutive itera-

tions. During this iterative process, we use only the

generalized features for the source domain, whereas

for the target domain we use the whole vocabulary as

features. Also, during training, we reduce the weight

for the source domain instances (to decrease the in-

ﬂuence of the source), while increasing the weight for

the new target domain instances, in an effort to help

predict the target domain instances accurately.

In the second case, we assume that the target do-

main has a small amountof labeled data and also unla-

beled data. This case is similar to the ﬁrst case, except

that we used both source-domain labeled data along

with target-domain labeled data instead of using only

source labeled data to initially train the classiﬁer.

3.2 Features

As mentioned earlier, we focus on sentence level sen-

timent classiﬁcation and build classiﬁers based on

grams derived from structured syntax trees. For a

given sentence, we retrieve its complete syntax tree

using the Stanford parser described in (Klein and

Manning, 2003). A syntax tree is an ordered tree con-

sisting of a root node, branch nodes and leaf nodes.

For example, if the original sentence is ”Too simple

for its own good.”, the syntax tree obtained using the

Stanford parser is shown in Figure 1.

To generate features from syntax trees, we con-

struct “grams”, which are deﬁned as subtrees of the

complete syntax trees. Speciﬁcally, we use the fol-

lowing classes of subtrees (grams) for our problem:

All Grams with Leaf Nodes: This representation has

all the possible parent-child subtrees as features.

Unigrams with Leaf Nodes: This type representa-

tion contains unigram subtrees as features. Unigram

subtrees are deﬁned as pairs composed of a child node

which is the leaf, and the corresponding parent node.

Unigrams without Leaf Nodes: This type of fea-

KDIR2013-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

172

ROOT

ADJP

too

simple

for

PRP$

its

own

good

Figure 1: Syntax tree for the phrase “too simple for its own

good” generated using the Stanford parser.

ture representation contains all possible unigram sub-

trees as features, except the unigrams with leaf nodes.

Thus, unigram subtrees are deﬁned as pairs composed

of a child node and its corresponding parent, where

the child node is not a leaf node.

All Unigrams: All possible unigrams present in the

syntax tree are used as features. The combination of

unigrams with leaf nodes and unigrams without leaf

nodes gives all possible unigrams.

Experiments performed with these types of grams

as features, using a supervised algorithm in a single

domain, showed that they can effectively predict the

sentiment of review sentences. However, the main

challenge for performing domain adaptation is to se-

lect the domain independent features, i.e., features

that bridge between the source domain and target do-

main. As mentioned earlier, domain independent fea-

tures have the same meaning in both source and tar-

get domains. The domain independent features are

important because these features occur frequently in

both domains and can be used in order to transfer

knowledge from source to target.

Our goal is to learn a classiﬁer based on source do-

main labeled data along with target domain unlabeled

data, or on source domain labeled data along with tar-

get domain labeled and unlabeled data, and use the

classiﬁer to predict the labels for target domain un-

labeled instances. Source data should be represented

using domain independent features, while target data

is represented using all features in the target domain

(including the speciﬁc features), as we want to learn

to predict target well. We use the Frequently Co-

occurring Entropy (FCE) method as described by Tan

et al., (2009) to retrieve the domain independent fea-

tures, also known as generalized features. This mea-

sure satisﬁes the following two criteria: (a) indepen-

dent features occur frequently in both source and tar-

get domains; (b) Independent features must have sim-

ilar occurring probability. To satisfy these require-

ments, we used the formula from (Tan et al., 2009):

= log



(v) ∗ P

(v)

(v) − P

(v)



where f

represents the entropy value for the feature

v, P

(v) is the probability of feature v occurring in the

source domain and P

(v) is the probability of feature v

occurring in the target domain. Speciﬁcally, we have:

(v) =

+ α)

+ 2∗ α)

, P

(v) =

+ α)

+ 2∗ α)

where N

and N

denote the number of times feature v

has occurred in the source domain and target domain,

respectively. D

and D

denote the total number of

instances in the source domain and target domain, re-

spectively. We haveused a constant α to smooth prob-

abilities and avoid overﬂow. In our work, α value is

set as 0.0001. To avoid the division by zero when both

the source domain and the target domain probabilities

are the same, a constant factor β is introduced, which

in our work is set to 0.0001, and the above formula is

modiﬁed as follows:

= log



(v) ∗ P

(v)

(v) − P

(v)) + β



4 EXPERIMENTS

Before performing the domain adaptation experi-

ments, we studied the necessity of using all the grams

(within a category) as opposed to reducing the num-

ber of grams in the target domain, based on frequency.

Speciﬁcally, we performed experiments where tar-

get data was represented with all grams or only with

grams that occur more than one time, two times and

three times, respectively. Similarly, we varied the

number of FCE grams in order to identify the number

of domain independent features to be used for repre-

senting the source data. Speciﬁcally, we ran experi-

ments with 50 and 100 domain independent features.

The results, reported in (Cheeti, 2012), show that it is

preferable to remove grams that occur only once, and

also that it is preferable to use 100 FCE as opposed to

only 50. Thus, for all the experiments reported here

(corresponding to various source and target combina-

tions, along with different gram representations), we

use the top 100 FCE grams as generalized features,

and consider only grams that appear more than one

time in source and target domains.

The purpose of these experiments is to compare

the performance of sentiment classiﬁcation using the

Adapted Na¨ıve Bayes algorithm (ANB), a domain

adaptation classiﬁer, and the supervised Na¨ıve Bayes

Multinomial algorithm (NBM), a domain-speciﬁc

classiﬁer, across various combinations of source and

Cross-domainSentimentClassificationusinganAdaptedNaïveBayesApproachandFeaturesDerivedfromSyntaxTrees

173

target domains. We also compare the performance of

the ANB classiﬁer for cross-domain sentiment clas-

siﬁcation when using a small amount of target do-

main labeled data versus the ANB classiﬁer without

any target domain labeled data.

We used 3 domains and 4 datasets in our exper-

iments (see Section 5 for more details). In each ex-

periment, we start with a labeled set of target data,

and we split this set into two subsets, one that will

be used as target labeled data and another one (larger)

that will be used as target unlabeled data (by pretend-

ing that the labels are not known, in other words, not

using them in the learning process). Each subset is

further split into three sub-folds in order to apply the

cross-validation technique. Our choice to work only

with target data for which labels are known is justi-

ﬁed by the choice of comparisons that we perform be-

tween the cross-domain adaptation algorithms and su-

pervised baselines, as described below. Speciﬁcally,

we performed the following experiments in our work:

1. ANB SL TL denotes experiments performed

in a cross-domain sentiment classiﬁcation framework,

using our extension of the ANB classiﬁer, which al-

lows the labeled data from both the source domain

(SL) and the target domain (TL) to be utilized. Here,

along with the source domain labeled data (SL), two

folds of target domain labeled data (TL) and two folds

of target domain unlabeled data (TU) are used in the

training phase. The resulting model is tested on the

remaining fold of the target domain unlabeled data.

This procedure is repeated 3 times, in order to per-

form 3-fold cross-validation. The labeled data (SL)

from the source domain is used in its entirety with

each of the three folds of the target data.

2. ANB

SL represents experiments using an ANB

classiﬁer trained on labeled data that comes from the

source domain (SL) and unlabeled data (TU) coming

from the target domain. In this scenario, the assump-

tion is that the target domain has no labeled instances,

hence knowledge from the source domain must be

leveraged. At each fold, all the labeled data from the

source domain is used along with two folds of unla-

beled data from the target domain to learn a model,

which is then evaluated on the remaining third fold of

the target unlabeled data. ANB

SL is expected to be

worst than ANB SL TU, since labeled data is coming

only from the source domain, whereas ANB

SL TU

also makes use of whatever limited (but nevertheless

important) amount of labeled data the target domain

may have. As before, this procedure is repeated 3

times, in order to perform 3-fold cross-validation.

3. NBM

TL corresponds to experiments using a

supervised classiﬁer (NBM) trained only on the tar-

get domain labeled data (TL), which is also used in

ANB

SL TL. Again, one third of the target domain

unlabeled data is used for evaluating the model in a 3-

fold cross validation procedure. The results of this ex-

periment are expected to be worse than the results of

the domain adaptation experiments(i.e., ANB

SL TL

and ANB SL) because the model is trained only on

the labeled data (without any unlabeled data). How-

ever, being trained on the labeled data from the tar-

get domain, NBM

TL is expected to exhibit improve-

ments over the model learned from source labeled

data, namely NBM

SL.

4. NBM

TL TU denotes experiments using a su-

pervised classiﬁer trained on target domain labeled

data (TL) along with target domain unlabeled data

(TU) as training instances (the labels of the TU re-

vealed this time). The remaining one third of target

domain unlabeled data is used as testing, with labels

intentionally ignored so that we can asses the quality

of the model. The results of this experiment are ex-

pected to be better than the results of the domain adap-

tation experiments (i.e., ANB SL TL and ANB SL),

given that all data available is used as labeled data.

5. NBM SL refers to the experiments with the

supervised classiﬁer NBM trained on source domain

labeled data. The results of these experiments are ex-

pected to be worst than any results where target data is

used, as the resulting classiﬁer is tested on the target

data, which is presumably different from the source.

Our experiments are designed to compare

the results of the domain adaptation algorithms

(ANB

SL TL and ANB SL) with the results of su-

pervised domain speciﬁc algorithms, where either tar-

get data (NBM

TL and NBM TL TU) or source data

(NBM

SL) is used. We expect the results of the

NBM

SL classiﬁer to be worse than the results of

classiﬁers where any target data is used, unless the

source data is very similar to target data. Further-

more, we expect the following relationship among the

results of the classiﬁers that make use of any target

data: NBM

TL TU > ANB SL TL >NBM TL >

ANB

SL> NBM SL

5 DATASETS

In our experiments we used customer reviews from

Amazon (by crawling the Amazon customer reviews)

and BestBuy (for which the BestBuy API package

at https://bbyopen.com/developer was used). We as-

sumed that reviews with ratings 4 or 5 are positive and

reviews with ratings 1 and 2 are negative.

We considered three domains in our study: movie

reviews, DVD reviews and kitchen appliance reviews.

Our goal was to include a pair of two more closely re-

KDIR2013-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

174

lated domains (i.e., movies and DVDs), and two pairs

of more distant domains (movies and kitchen appli-

ances, and DVDs and kitchen appliances). This al-

lows us to study the effect that the closeness of the do-

mains has on the performance of the domain adapta-

tion algorithms. Intuitively, we expect that the closer

the domains, the more knowledge can be transferred

from source to target, and thus the better the perfor-

mance of the domain adaptation algorithms.

We collected an equal number of positive and neg-

ative reviews for each domain, as we did not aim to

study the effect of data imbalance on the results of do-

main adaptation classiﬁers. Speciﬁcally, we collected

400 reviews (200 positive and 200 negative) for each

of the movie (M), DVD (D) and kitchen appliance (K)

domains. We extracted as many reviews as available

from BestBuy (using the BestBuy API) and the rest

were manually crawled from Amazon.

In addition to the initial 400 reviews for DVDs,

we collected 400 more DVD reviews, as we wanted

to study the performance of the algorithms with the

size of the data available, in our case 400 versus 800

instances available. We denote the dataset containing

800 DVD reviews by D’. Using these datasets, we run

experiments with the following source/target combi-

nations in our study: D → M, M → D,M → K,K →

M,D → K, K → D. Here, the left side of the arrow

represents the source domain and the right side of the

arrow represents the target domain.

To summarize, we assembled the datasets de-

scribed above with the following questions in mind:

1. How does the distance between source and tar-

get domains affect the performance of the domain

adaptation classiﬁers? Can we learn better classi-

ﬁers for D → M,M → D combinations as opposed to

M → K, K → M,D → K,K → D combinations?

2. Is there a similar amount of knowledge transfered

between 2 domains, regardless of the direction? In

other words, do we observe similar performance for

D → M and M → D?

3. How does the number of instances in the target

domain affectthe performance? Is M → D

′

better than

M → D classiﬁer?

6 EXPERIMENTAL RESULTS

As mentioned in Section 4, we only used grams that

occur more than once in the target dataset (in other

words, we removed grams that occurred just one

time). Furthermore, we used 100 generalized features

to represent the source data. We measured the perfor-

mance of the classiﬁers using the F1 measure.

The values of the averaged F1 measure (over the

positive and negative classes) are shown in Figure 2

for all grams with leaf nodes. The trend observed

is consistent throughout all of our experiments: uni-

grams without leaf nodes, all unigrams, all grams (re-

sults not shown due to space constraints). However,

the results for “all grams with leaf nodes ” are better

than the results for “unigrams with leaf nodes”, which

in turn are better than the results for “all unigrams”.

Finally, “unigrams without leaf nodes” give the worst

results. In other words, “all grams with leaf nodes”

contain more predictive information than simply uni-

grams. On the other hand, if unigrams are used, the

most predictive ones are those with leaf nodes. There-

fore, the actual words used in the sentences are im-

portant for classiﬁcation. But the syntactic structure

of the sentence, captured in the “all grams with leaf

nodes” is also important.

Furthermore, Figure 2 shows that the F1 values for

NBM

TL TU are consistently better than the results

of all the other classiﬁers. This is what we expected

as, in this experiment, we assume that all data is la-

beled (labels of the “unlabeled” data used for the do-

main adaptation classiﬁers are revealed here). Also,

the results of ANB

SL TL are better than the results

of ANB

SL - indeed, our modiﬁcation of the origi-

nal domain adaptation algorithm, where we make use

of some labeled target data in addition to unlabeled

target data, shows better performance. As expected,

NBM

TL gives worse results than ANB SL TL.

The results of the comparison between NBM

(supervised classiﬁer trained only on source) and

ANB

SL (domain adaptation classiﬁer that uses

source labeled data and only unlabeled target data)

show that ANB

SL is not always better than than

NBM

SL. Again, our modiﬁcation to include some

target labeled data (when available) is beneﬁcial, as

the domain adaptation algorithm is not always justi-

ﬁed otherwise - a classiﬁer learned just from source

can sometimes give better results. One possible ex-

planation for this is that in the absence of any labeled

target data, the original labels assigned to the unla-

beled data are not so good, and ultimately the perfor-

mance is worse than learning from source alone.

From the graph, we can also see that the trans-

fer of knowledge is not symmetric, as the classiﬁers

learned depend on the set of instances provided as in-

put and the input instances could be better in a di-

rection as opposed to another. Thus, we observed

that the performance of sentiment classiﬁcation is

better for M → D,M → K,K → D as compared to

D → M,K → M, D → K.

The knowledge transfer is also inﬂuenced by the

distance between domains. As D and M are more

closely related than D and K, or M and K, the results

Cross-domainSentimentClassificationusinganAdaptedNaïveBayesApproachandFeaturesDerivedfromSyntaxTrees

175

of the domain adaptation algorithms are generally bet-

ter for the D/M source/target combinations. When the

domains are closer related, the use of source labeled

data is more helpful than in the cases where the do-

mains are more distant. The results for the combina-

tions D/K are better than the results for D/M. This can

be explained by the fact that both D and K are product

review, whereas M represents movie reviews.

At last, we observed that the M → D classiﬁer re-

sults are better than M → D

′

. One possible expla-

nation for this is that as we increase the number of

instances in the target domain, the classiﬁer learned

from the source domain may not be as informative to

predict the labels for the target domain instances dur-

ing the EM iterations.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

M->D' D'->M M->D D->M D->K K->D K->M M->K

NBM_SL ANB_SL NBM_TL ANB_SL_TL NBM_TL_TU

Figure 2: Domain Adaptation using All Grams with Leaf

Nodes coming from Complete Syntax Trees (CST).

7 CONCLUSIONS AND FUTURE

WORK

We have studied the use of a domain adaptation al-

gorithm, speciﬁcally the ANB algorithm, for learn-

ing classiﬁers for predicting the sentiment of reviews

across domains. In addition to using the original ver-

sion of the ANB algorithm, which makes use of la-

beled source data and unlabeled target data, we have

also used a modiﬁed version of the algorithm, which

makes use of target labeled data, in addition of labeled

source data and unlabeled target data.

Based on the experimental results, we can con-

clude that the ANB classiﬁer increases the perfor-

mance of sentiment classiﬁcation across domains es-

pecially when we use some labeled target data with

our modiﬁed version of the original ANB. We can

also conclude that the results obtained using “all

grams with leaf nodes ”>“unigrams with leaf nodes

”>“all unigrams”>“unigrams without leaf nodes”.

However, while our ANB approach performed

well across different domains, there are some ideas

that we would like to explore in future work. First,

we would like to perform sentiment classiﬁcation

across domains by training a classiﬁer using grams

extracted from minimal complete trees (obtained us-

ing sentiment-based pruning strategies). The expecta-

tion here is that some parts of the tree that might not

be useful for the sentiment classiﬁcation problem will

be removed. Second, we would like to perform sen-

timent classiﬁcation across domains, by considering

grams extracted from path subtrees (obtained using

adjective-based pruning strategies). As for minimal

complete trees, we expect that the results might be

better when we remove parts of the trees that might

not be predictive. Last, we would like to explore the

identiﬁcation and use of “interesting” part of speech

(POS) patterns for a given set of sentences, with the

expectation that more carefully designed pattern fea-

tures might result in better results.

REFERENCES

Blitzer, J., Dredze, M., and Pereira, F. (2007). Biographies,

bollywood, boomboxes and blenders: Domain adap-

tation for sentiment classiﬁcation. In ACL.

Blitzer, J., McDonald, R., and Pereira, F. (2006). Domain

adaptation with structural correspondence learning. In

Proc. of the 2006 Conference on Empirical Methods in

Natural Language Processing, EMNLP ’06. ACL.

Cheeti, S. (2012). Cross-domain sentiment classiﬁcation

using grams derived from syntax trees and an adapted

na¨ıve bayes approach (thesis).

Dai, W., Xue, G.-R., Yang, Q., and Yu, Y. (2007). Trans-

ferring naive bayes classiﬁers for text classiﬁcation.

In Proc. of the 22nd national conference on Artiﬁcial

intelligence - Volume 1, AAAI’07. AAAI Press.

Harb, A., Planti´e, M., Dray, G., Roche, M., Trousset, F.,

and Poncelet, P. (2008). Web opinion mining: how to

extract opinions from blogs? In Proc. of the 5th inter-

national conference on Soft computing as transdisci-

plinary science and technology, CSTST ’08. ACM.

Klein, D. and Manning, C. D. (2003). Accurate unlexical-

ized parsing. In Proc. of the 41st Meeting of the ACL.

Nigam, K., McCallum, A., Thrun, S., and Mitchell,

T. (1998). Learning to classify text from labeled

and unlabeled documents. In Proc. of the ﬁf-

teenth national/tenth conference on Artiﬁcial intel-

ligence/Innovative applications of artiﬁcial intelli-

gence, AAAI ’98/IAAI ’98. AAAI.

Pan, S. J., Ni, X., Sun, J.-T., Yang, Q., and Chen, Z.

(2010). Cross-domain sentiment classiﬁcation via

spectral feature alignment. In Proc. of the 19th inter-

national conference on World wide web, WWW ’10.

ACM.

Tan, S., Cheng, X., Wang, Y., and Xu, H. (2009). Adapting

naive bayes to domain adaptation for sentiment anal-

ysis. Advances In Information Retrieval Proceedings,

5478.

Zhang, W., Li, P., and Zhu, Q. (2010). Sentiment classi-

ﬁcation based on syntax tree pruning and tree kernel.

In Proc. of the 2010 Seventh Web Information Systems

and Applications Conference, WISA ’10. IEEE Com-

puter Society.

KDIR2013-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

176