Propagation-Based Domain-Transferable Gradual Sentiment Analysis

elia da Costa Pereira

1 a

, Claude Pasquier

1 b

and Andrea G. B. Tettamanzi

2 c

Universit

e C

ote d’Azur, I3S, CNRS, Sophia Antipolis, France

Universit

e C

ote d’Azur, I3S, Inria, Sophia Antipolis, France

Keywords:

Sentiment Analysis, Fuzzy Polarity Propagation.

Abstract:

We propose a novel reﬁnement of a gradual polarity propagation method to learn the polarities of concepts

and their uncertainties with respect to various domains from a labeled corpus. Our contribution consists of

introducing a positive correction term in the polarity propagation equation to counterbalance negative psycho-

logical bias in reviews. The proposed approach is evaluated using a standard benchmark, showing an improved

performance relative to the state of the art, good cross-domain transfer and excellent coverage.

1 INTRODUCTION

Sentiment analysis aims at determining the global po-

larity (positive, negative or neutral) of a document

based on the polarities of the words in the document.

A supervised method trains a model (classifyer) by

using the datasets of reviews or labeled texts and use

such models to classify the user opinions. However,

the polarity of some words in a review might depend

on the domain knowledge considered (Rexha et al.,

2018; Pirnau, 2018). For example (Yoshida et al.,

2011), the word ‘long’, which has a positive polarity

in the Camera domain, has a negative polarity if we

are characterizing the execution time of a computer

program.

Several solutions have been proposed. Concept-

based approaches include the one proposed by

Schouten et al. (Schouten and Frasincar, 2015), who

show that considering concept-based features instead

of term-based features helps improving the perfor-

mance of multi-domain sentiment analysis methods.

The good quality of the results obtained with this rel-

atively straightforward setup encourages the use of

more advanced ways of handling semantic informa-

tion.

Yoshida et al. (Yoshida et al., 2011) propose a so-

lution to improve transfer learning methods. How-

ever, as it has been rightly pointed out by Abdullah

et al. (Abdullah et al., 2019), the transfer learning ap-

https://orcid.org/0000-0001-6278-7740

https://orcid.org/0000-0001-7498-395X

https://orcid.org/0000-0002-8877-4654

proach imposes the necessity to build a new transfer

model and this limits its generalization capability.

A serie of works (Dragoni et al., 2014; Dragoni,

2015; Dragoni et al., 2015; Dragoni et al., 2016)

use fuzzy logic to model the relationships between

the polarity of concepts and the domain. They use

a two-level graph, where the ﬁrst level represents

the relations between concepts, whereas the second

level represents the relations between the concepts

and their polarities in the various targeted domains,

the idea being to capture the fact that the same con-

cept can be positive in one domain, but negative in

another. This is accomplished thanks to a polarity

propagation algorithm and without the necessity of

starting the learning process for each different do-

main. The main advantage of that approach, named

MDFSA (Multi-Domain Fuzzy Sentiment Analyzer),

which won the ESWC 2014 Concept-Level Sentiment

Analysis Challenge (Dragoni et al., 2014), is that it

both accounts for the conceptual representation of the

terms in the documents by using WordNet and Sentic-

Net, and proposes a solution avoiding to build a new

model each time a new domain needs to be analysed.

However, MDFSA had several issues, including

the following:

1. It is not possible to discard some of the remain-

ing word ambiguities due to the fact that a synset

corresponds to a group of words (nouns, adjec-

tives, verbs and adverbs) that can be interchange-

able and depending on the type of the term used,

the meaning of the word can change.

520

Pereira, C. C., Pasquier, C. and Tettamanzi, A. G. B.

Propagation-Based Domain-Transferable Gradual Sentiment Analysis.

DOI: 10.5220/0013158200003890

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 3, pages 520-527

ISBN: 978-989-758-737-5; ISSN: 2184-433X

2. The same stopping criterion for the propagation

algorithm is used for the different independent do-

mains which could be challenged, i.e. that simul-

taneous stopping of propagation could be prema-

ture for some domains and delayed for others.

3. The propagation of polarities takes place without

taking into account the similarity of related con-

cepts in the graph. Indeed, the more similar the

concepts are, the higher the weights associated

with their relationship in the graph should be.

To solve them, we recently proposed Sental (Pasquier

et al., 2020), an extension of MDFSA, wich we take

as a starting point. In particular, concerning (1), to

decide whether a term occurring in a document is as-

sociated to a synset v or not, Sental looks at its part

of speech (POS) tag and considers it an instance of

v only if its POS tag matches the POS of v. Con-

cerning (2), Sental tests for convergence for each do-

main separately. Finally, concerning (3), Sental uses

a pre-trained word embedding model to complete the

semantic graph with a graded relation of semantic

similarity, in addition to the crisp relations deﬁned in

WordNet.

Nevertheless, Sental still has an important issue:

after a few iterations, its propagation process tends to

drive polarities towards −1. This happens because,

on average, negative sentiments have more impact

and are more resistant to disconﬁrmation than posi-

tive ones.

The aim of this paper is to solve this problem.

We propose a new formula with a term to correct the

negative bias by exerting a positive pressure which is

strongest for neutral polarities and whose effect di-

minishes for extreme polarities. To the best of our

knowledge it is the ﬁrst time that this aspect, which

is very well known in psychology (Baumeister et al.,

2001), is taken into account in a sentiment analysis

framework.

The rest of the paper is structured as follows. Sec-

tion 2 presents the original method we are proposing.

Section 3 presents the experiments and discuss the re-

sults. Finally, Section 4 concludes the paper.

2 METHOD

The algorithm used to learn concept polarities for var-

ious domains consists of three phases, namely:

1. Semantic graph construction from background

knowledge;

2. Concept polarity initialization, based on a training

set of documents, associated with a domain and

labeled with a rating;

Figure 1: An illustration of the semantic graph constructed

by the proposed method. Circles represent the vertices of

the graph and solid lines its edges. The documents of the

training set are shown around the semantic graph. Dashed

lines represent the occurrence, in a document, of a term as-

sociated with a vertex of the graph (i.e., a lemma or a Word-

Net synset). Documents are rated (e.g., on a scale from −−

to ++, which may then be mapped to the [−1,1] interval).

3. Propagation of polarity information over the se-

mantic graph.

Its result is an estimation of polarities, represented as

convex fuzzy sets over the [−1,+1] interval, for each

concept and for each domain. Figure 1 illustrates the

idea of a semantic graph, whose construction and use

is detailed below.

2.1 Semantic Graph Construction

The backbone of the semantic graph is based on

WordNet (Miller, 1995) and SenticNet (Cambria

et al., 2010), combined as in (Dragoni et al., 2015).

Both are publicly available lexical databases in which

nouns, verbs, adjectives, and adverbs are organized

into sets of synonyms (synsets), each representing a

lexicalized concept. Synsets are linked by semantic

relationships, including synonymy, antonymy and hy-

pernymy. We distinguish concepts, which are abstract

notions representing meaning, from terms, which are

tangible ways of expressing concepts (in written lan-

guage, they are words or phrases). Now, words, and

typically the most frequent words (Casas et al., 2019),

can be polysemous; a preprocessing phase is thus

needed to link the terms found in the texts as accu-

rately as possible to their corresponding synsets. Un-

like WordNet, SenticNet is speciﬁcally built for opin-

ion mining, but the polarities it associates to terms

are not used, because the very assumption on which

our approach is based is that polarity is not an in-

trinsic property of a term, but an extrinsic, domain-

dependent property. Besides covering terms that are

not indexed in WordNet, SenticNet allows to resolve

Propagation-Based Domain-Transferable Gradual Sentiment Analysis

521

a number of cases of ambiguity.

To further reduce ambiguity, we propose to parse

the text and use the POS of a term to associate it to the

correct synset. As an example, the same term ‘light’

can be, according to WordNet 3.1, a noun (‘do you

have a light?’), a verb (‘light a cigarette’), an adjec-

tive (‘a light diet’) or an adverb (‘experienced trav-

elers travel light’), which are represented by distinct

synsets.

In addition to the WordNet and SenticNet re-

lations, already used in the literature, we use a

word embedding model, pre-trained by applying

Word2vec (Mikolov et al., 2013) to roughly 100 bil-

lion words from a Google News dataset,

to com-

plete the semantic graph with relationships of seman-

tic similarity between terms.

The semantic graph is constructed as a weighted

graph (V,E,w). Each element of V is either a concept

(i.e., a synset) or the canonical form (lemma) of a term

used in a review; w : E → [−1,1] is a weight function

and the edges in E are created:

• between synsets linked by a hypernym relation-

ship, a synset and its lemmas, and between

lemmas linked by a synonym relationship, with

weight +1;

• between lemmas linked by an antonym relation-

ship, with weight −1;

• between each lemma and the ﬁve closest lem-

mas according to the pre-trained word2vec model,

with their cosine similarity as weight.

Each vertex v ∈ V is labeled by a vector ⃗p(v) of polar-

ities, so that p

(v) is the polarity of v in the ith domain.

2.2 Concept Polarity Initialization

The initial polarities ⃗p

(0)

(v) ∈ [−1,1] of all the ver-

tices v of the semantic graph are computed, for each

domain i, as the average polarity of the documents of

domain i in the training set, in which at least a term

of v occurs. If no term associated to v occurs in a

document of domain i, p

(0)

(v) = 0.

As explained above, to decide whether a term oc-

curring in a document is associated to a synset v or

not, we look at its POS tag and we consider it an in-

stance of v only if its POS tag matches the POS of

2.3 Polarity Propagation

In this phase, information about the polarity of ver-

tices is propagated through the edges of the graph, so

https://frama.link/google\ word2vec

that concepts for which no polarity information could

be directly extracted from the training set (i.e., those v

such that p

(0)

(v) = 0 for some i) can “assimilate”, as

it were, the polarity of their close relatives. In addi-

tion, this propagation process may contribute to cor-

rect or ﬁne-tune the polarity of incorrectly initialized

concepts and, thus, reduce noise.

Polarity propagation through the graph is carried

out iteratively. At each iteration t = 1,2, ..., the po-

larity p

(t)

(v) of each vertex v for domain i is updated

taking into account both the values of its neighbors

N(v) = {v

′

| (v,v

′

) ∈ E} and its distinctiveness from

the other terms in the domain.

⃗p

(t+1)

(v) = (1 − λ)⃗p

(t)

(v)

+ λ

∥N(v)∥

∑

′

∈N(v)

⃗p

(t)

′

)w(v,v

′

) (1)

+ ϕ(1 − |⃗p

(t)

(v)|),

where w(v, v

′

) denotes the weight of the edge between

vertices v and v

′

, 0 < λ < 1, the propagation rate and

0 < ϕ < 1, the positive correction strength are param-

eters of the algorithm.

Notice that the propagation of polarity for one do-

main does not interact with the same process for the

other domains and we can thus consider that polar-

ity propagation is carried out in parallel and indepen-

dently for each domain.

Inspired by simulated annealing (Kirkpatrick

et al., 1983), the propagation rate is decreased at each

iteration, according to a parameter A called annealing

rate. Thus, the value of λ at iteration t is calculated

according to the value of λ at iteration t − 1 as follow:

= Aλ

t−1

. (2)

In the method proposed by Dragoni et al. (Dragoni

et al., 2015), the iterative process stops as soon as the

sum of the variations of the polarity for each concept

and domain falls below a ﬁxed threshold. The draw-

back of using a ﬁxed convergence limit is that it de-

pends on the dataset used. Indeed, a dataset composed

of many domains using lots of different terms will

logically generate a greater variation than a smaller

dataset. In our proposed method, we specify a thresh-

old that applies to each domain separately and that is

relative to the number of different nodes composing

the semantic graph of the domain. Thus, for the ith

domain, the polarity propagation stops when the av-

erage polarity variation,

∆

(t)

∥V ∥

∑



(t)

(v) − p

(t−1)

(v)



, (3)

falls below a threshold L, which is the convergence

limit. We denote by t

stop

the total number of iterations

carried out for the ith domain until ∆

(t)

< L.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

522

Figure 2: A fuzzy set with a trapezoidal membership func-

tion, deﬁned by the four parameters (a,b,c,d).

Experience shows that, after a few iterations, the

propagation process described above tends to drive

polarities towards −1. This happens because, on aver-

age, negative sentiments in reviews are more strongly

expressed than positive sentiments. Such difference

of intensity between negative and positive sentiments

has already been observed and is well-known in psy-

chology (Baumeister et al., 2001). That is why we

have added a term to correct this negative bias (third

line of Equation 1), by exerting a positive pressure

which is strongest for neutral polarities and whose

effect tapers off for extreme polarities. The overall

strength of this positive correction term is controlled

by parameter ϕ.

At each iteration t = 0, 1,2,. .., the vectors ⃗p

(t)

(v)

are saved in order to exploit them for the calculation

of the shapes of the fuzzy membership functions de-

scribing the polarity of concept v for each domain. In-

deed, the ﬁnal polarities are represented as trapezoidal

fuzzy membership functions, whose core is the in-

terval between the initial polarity computed from the

training set, p

(0)

(v), and the polarity resulting from

the propagation phase, p

stop

)

(v) and whose support

extends beyond the core on either side by half the vari-

ance σ

v,i

of the distribution of p

(t)

(v), t = 0, ... ,t

stop

To sum up, for each domain i, µ

v,i

is a trapezoid with

parameters (a,b, c,d), like the one depicted in Fig-

ure 2, where

a = min{p

(0)

(v), p

stop

)

(v)},

b = max{p

(0)

(v), p

stop

)

(v)},

c = max{−1,a − σ

v,i

/2},

d = min{1,b + σ

v,i

/2}.

The idea here is that the most likely values for the po-

larity of v for a domain are those comprised between

the initial and ﬁnal value of the polarity propagation

phase and the more quickly the polarity values con-

verged during that phase, the least uncertainty there

is about the resulting polarity estimate. Conversely,

a polarity value that converged slowly or with many

ﬂuctuations is going to yield a less reliable, and thus

more uncertain, estimate.

2.4 Document Polarity Calculation

Once the model is trained according to the algorithm

described in the previous sections, the (fuzzy) polar-

ity of a novel document D of the ith domain is com-

puted as the average of the fuzzy polarities (repre-

sented by their trapezoidal membership functions) of

all the terms v occurring in the document:

∥V

∥

∑

v∈V

v,i

, (4)

where V

= {v ∈ V | v occurs in D}. This average of

fuzzy sets is computed by applying the extension prin-

ciple, thus yielding, for all x ∈ [−1,1],

(x) = sup

∥V

∥

∑

v∈V

min

v∈V

v,i

). (5)

However, given that all µ

v,i

are trapezoidal with pa-

rameters

as pointed out in (Dragoni et al., 2015), µ

will always

be trapezoidal as well, with parameters

∥V

∥

∑

v∈V

∑

v∈V

∑

v∈V

∑

v∈V

. (6)

This fuzzy polarity reﬂects the uncertainty of the esti-

mate obtained by the model. A single polarity ﬁgure

can be obtained by applying a defuzziﬁcation method.

For the empirical validation of our method, we used

the centroid method for this purpose.

3 EXPERIMENTS AND RESULTS

The proposed system (that we call Sental+) has been

evaluated using the DRANZIERA evaluation proto-

col (Dragoni et al., 2016), a multi-domain sentiment

analysis benchmark, which consists of a dataset con-

taining product reviews from 20 different domains,

crawled from the Amazon web site, as well as guide-

lines allowing the fair evaluation and comparison

of opinion mining systems. In the dataset of the

DRANZIERA benchmark, each domain is composed

of 5,000 positive and 5,000 negative reviews that are

split in ﬁve folds containing 1,000 positive and 1,000

negative reviews each.

3.1 Experimental Protocol

As suggested by the guideline of the DRANZIERA

evaluation protocol (Dragoni et al., 2016), the perfor-

mance of the method has been assessed by performing

Propagation-Based Domain-Transferable Gradual Sentiment Analysis

523

a 5-fold cross validation. For each speciﬁc domain,

the method was trained on four of the ﬁve folds pro-

vided with the benchmark and tested on the remain-

ing fold. The process is repeated ﬁve times so that

each fold is in turn used for testing. Experiments were

performed on a computer running Ubuntu 18.04 and

based on a Intel©Core™ i7-7700 @ 3.60GHz with 32

Gb main memory.

The algorithm depends on four different param-

eters: the propagation rate λ, which determines the

diffusion rate of the polarity values between concepts,

the positive correction strength ϕ that maintains the

diversity of polarities, the convergence limit L, which

represents the criterion for stopping the polarity prop-

agation phase for each domain, and the annealing rate

A, used to decrease, at each iteration, the propagation

rate (cf. Equation 2). In order not to bias the method

settings by selecting parameters that could be specif-

ically suited to the DRANZIERA dataset, we carried

out the method tuning using a separate dataset. For

this purpose we used the Blitzer dataset (Blitzer et al.,

2007) which is composed of product reviews belong-

ing to 25 domains. A quick scan of the dataset shows

both that there is a large discrepancy in the number

of reviews available for each domain and that posi-

tive reviews are much more frequent. For example,

the books category contains 975194 reviews (123899

negatives and 851295 positives) while the tools &

hardware domain contains only 14 negative reviews

and 98 positives. In order not to bias the system, we

have taken care to balance the reviews between pos-

itive and negative. We limited the maximum number

of reviews to 1600 (composed by the 800 ﬁrst pos-

itive and the 800 ﬁrst negative reviews found in the

XML ﬁles). Using a small portion of the dataset, we

have therefore experimented different conﬁgurations

of parameters (λ,ϕ, L,A) by varying the propagation

rate between 0.1 and 0.9 in 0.1 steps, by testing all

values for annealing rate between 0.0 and 1.0 in 0.1

steps and using 10

−1

, 10

−2

and 10

−3

as values for the

convergence limit. Our experiments show that using a

propagation rate of 0.3, an annealing rate of 0.5, an

positive correction strength of 0.3 and a convergence

limit of 0.05 lead to the best results. The setting that

leads to the best results is λ = 0.3, ϕ = 0.3, L = 0.05,

and A = 0.5.

3.2 Results of DRANZIERA Evaluation

When applied to the DRANZIERA dataset with the

settings previously identiﬁed,

the average precision obtained over all 20 domains

is 0.8191, which constitutes a signiﬁcant improve-

ment over MDFSA (Dragoni et al., 2015), MDFSA-

Table 1: Average precision, recall, F1 score and standard

deviation of the F1 score obtained on the 20 domains of the

DRANZIERA dataset.

Method Precision Recall F1 score SD

MDFSA 0.6832 0.9245 0.7857 0.0000

MDFSA-NODK 0.7145 0.9245 0.8060 0.0001

IRSA 0.6598 0.8742 0.7520 0.0002

IRSA-NODK 0.6784 0.8742 0.7640 0.0003

Sental w/ embedding 0.7527 0.9942 0.8551 0.0446

Sental w/ POS 0.7617 0.9947 0.8612 0.0435

Sental+ 0.8191 0.9941 0.8975 0.0275

NODK (Dragoni et al., 2015), IRSA (Dragoni, 2015)

and IRSA-NODK (Dragoni et al., 2016) which obtain

a precision of 0.6832, 0.7145, 0.6598 and 0.6784 re-

spectively.

It should be noted that this improvement in pre-

cision is not detrimental to the recall value, which

is higher than the other methods. As a result, the

proposed method obtains an even greater improve-

ment with respect to the other methods if perfor-

mance is measured in terms of the F1 score, in par-

ticular a 9.15% improvement with respect to the best

of them, MDFSA-NODK. Table 1 provides a sum-

mary of the comparison of the results obtained by

our method with four other methods evaluated on

the DRANZIERA dataset, whose results are provided

in (Dragoni et al., 2016). Using word embedding

(Sental w/ embedding) provides an improvement over

existing methods; incorporating POS tags (Sental w/

POS) further improves precision and recall, but the

best results are achieved by adding a positive cor-

rection term to the polarity propagation equation

(Sental+).

The breakdown of the results obtained by our

method implementing the four proposed solutions on

the 20 domains of the DRANZIERA dataset is pre-

sented in Table 2.

3.3 Cross-Domain Transfer Experiment

To test the generalization capabilities of our ap-

proach, we have strived to use our model, trained

with DRANZIERA data, on other datasets. Sen-

timent analysis has become extremely popular but

datasets available for use by multi-domain sentiment

analysis are still scarce. Two datasets, proposed by

Hutto and Gilbert (Hutto and Gilbert, 2014), cor-

respond to domains that could beneﬁt from being

processed with our method. The ﬁrst one, ‘Prod-

uct’, contains 3708 customer reviews of ﬁve elec-

tronics products. The second one, ‘Movie’, con-

tains 10,605 sentence-level snippets from the site http:

//rotten.tomatoes.com reanalyzed by 20 independent

human raters. ‘Product’ reviews should be efﬁciently

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

524

Table 2: Detail of the results obtained on the 20 domains of

the DRANZIERA dataset.

Domain Precision Recall F1 score SD

Amazon Video 0.7253 0.9946 0.8389 0.0310

Automotive 0.8402 0.9943 0.9108 0.0219

Baby 0.7357 0.9945 0.8457 0.0702

Beauty 0.8523 0.9947 0.9180 0.0085

Books 0.7910 0.9911 0.8798 0.0410

Clothing 0.8807 0.9973 0.9354 0.0401

Electronics 0.7998 0.9950 0.8868 0.0293

Health 0.8388 0.9947 0.9101 0.0208

Home Kitchen 0.8142 0.9951 0.8956 0.0399

Movies TV 0.8064 0.9934 0.8902 0.0273

Music 0.7764 0.9933 0.8716 0.0333

Ofﬁce Products 0.8256 0.9949 0.9024 0.0168

Patio 0.8378 0.9945 0.9094 0.0201

Pet Supplies 0.7970 0.9912 0.8836 0.0265

Shoes 0.9183 0.9965 0.9558 0.0123

Software 0.8090 0.9943 0.8921 0.0216

Sports Outdoors 0.8320 0.9915 0.9048 0.0179

Tools Home Impr. 0.8430 0.9947 0.9126 0.0248

Toys Games 0.8581 0.9950 0.9215 0.0265

Video Games 0.7998 0.9916 0.8854 0.0201

Average 0.8191 0.9941 0.8975 0.0275

analyzed with our method trained on the ‘Electron-

ics’ domain of DRANZIERA dataset. Although the

DRANZIERA dataset does not contain the ‘Movie’

domain as such, it includes some related domains,

for example ‘Movies TV’ or ‘Amazon Instant Video’,

whose reviews may be used to infer the rating of

movies.

Our method, trained on each of the domains of

the DRANZIERA dataset was used to predict the ori-

entation of each ‘Movie’ and ‘Product’ review be-

tween positive and negative. Table 3 lists the pre-

cision values obtained on ‘Movie’ and ‘Product’ re-

views according to the category of the DRANZIERA

dataset used for training. Precisions displayed in

bold highlight the best score obtained for each do-

main. Recall values are not displayed because the

variation is small; they range from 0.9916 to 0.9944

for ‘Movie’ and from 0.9889 to 0.9964 for ‘Product’.

Overall, there is a decrease in performance when the

method, trained on DRANZIERA domains, is applied

to reviews originating from different datasets; which

seems perfectly logical. We can notice, in Table 3,

that the domain used for training has a great inﬂuence

on the results.

For a transfer to ‘Movie’, the best precision was

obtained when the training was performed with the

DRANZIERA reviews belonging to the ‘Movie TV’

domain. Although ‘Music’ is not exactly the category

for which one would have expected to obtain the best

cross-domain transfer quality, the result is still con-

sistent. The other domains for which the transfer of

the learned model is effective are ‘Books’, ‘Amazon

Table 3: Precision and recall of cross-domain transfer from

the 20 categories of the DRANZIERA dataset to indepen-

dant ‘Movie’ and ‘Product’ reviews.

DRANZIERA precision precision

domain on ‘Movie’ on ‘Product’

Amazon Instant Video 0.6962 0.5704

Automotive 0.6008 0.6490

Baby 0.5830 0.5735

Beauty 0.5992 0.6812

Books 0.7021 0.6196

Clothing Accessories 0.5885 0.6889

Electronics 0.5728 0.7309

Health 0.5899 0.6821

Home Kitchen 0.5875 0.6656

Movies TV 0.7134 0.6000

Music 0.6770 0.6507

Ofﬁce Products 0.6135 0.6813

Patio 0.6099 0.6885

Pet Supplies 0.5960 0.6720

Shoes 0.5795 0.6843

Software 0.6154 0.7098

Sports Outdoors 0.5797 0.6891

Tools Home Improvement 0.5910 0.7086

Toys Games 0.6200 0.6103

Video Games 0.6595 0.6942

Instant Video’ and ‘Music’. Overall, the reviews that

most closely mirror those of the ‘Movie’ reviews are

more oriented towards cultural goods, while the most

distant ones concern more tangible goods (‘Shoes’,

‘Sport Outdoors’ and ‘Electronics’). The result is

exactly the opposite for the transfer to ‘Product’ re-

views. The best precision is obtained when the train-

ing was performed on ‘Electronics’ and the trans-

fer works better overall when the method has been

trained on reviews about concrete objects (‘Software’

and ‘Tools Home Imrpovement’ are both domains that

allow to exceed a precision of 70%.).

The precision therefore varies by nearly 0.15 be-

tween a transfer done from the ‘Music’ domain and

a transfer from the ‘Electronics’ domain. However,

the difference can also be partly explained by the

intrinsic score obtained on each DRANZIERA do-

main. We can indeed notice in Table 3, that the dif-

ference in precision between the ‘Music’ domain and

the ‘Baby’ domain is slightly more than 0.14. How-

ever, we can observe signiﬁcant differences accord-

ing to the domains. Some cross-domain transfers

go very well, such as the transfer from domains like

‘Books’, ‘Movies TV’ or ‘Amazon Instant Video’ to

movie reviews since the accuracy decline is less than

0.03. Other cross-domain transfers are more prob-

lematic, such as those from ‘Electronics’, ‘Beauty’,

‘Sports Outdoors’, ‘Clothing Accessories’ or ‘Shoes’

to movies reviews since the accuracy falls by more

than 0.2. These observations also seem to make per-

fect sense.

Propagation-Based Domain-Transferable Gradual Sentiment Analysis

525

Table 4: A comparison of our method (trained on the

‘Movies TV’ domain of DRANZIERA) with the best meth-

ods benchmarked by Ribeiro et al., when applied to the

‘Movie’ dataset. The highest score of each column is high-

lighted in boldface.

Method Precision Recall F1 score

AFINN 0.6593 0.7259 0.6910

LIWC15 0.6335 0.6608 0.6469

Opinion Lexicon 0.6977 0.7728 0.7333

Pattern.en 0.6784 0.6559 0.6670

Semantria 0.6964 0.6880 0.6922

SenticNet 0.9630 0.6941 0.8067

SO-CAL 0.7165 0.8910 0.7943

Stanford DM 0.8270 0.9192 0.8707

VADER 0.6519 0.8270 0.7291

Sental+ 0.7134 0.9944 0.8308

To assess the performance of our method we com-

pared it with the benchmark of 24 sentiment analysis

methods performed by Ribeiro et al. (Ribeiro et al.,

2016).

Since the setting we have chosen leads to a high

recall value at the expense of precision, we list in ta-

bles 4 and 5 the prediction performance of all meth-

ods that achieve, in either domain, an F1 score greater

than 0.5 and a higher precision than our method.

Methods meeting these criteria are AFINN (Nielsen,

2011), LIWC15 (Tausczik and Pennebaker, 2010),

Opinion Lexicon (Hu and Liu, 2004),

Pattern.en (Smedt and Daelemans, 2012), Seman-

tria (Lexalytics, 2015), SenticNet (Cambria et al.,

2018),

SO-CAL (Taboada et al., 2011), Stanford

DM (Socher et al., 2013), and VADER (Hutto and

Gilbert, 2014).

Regarding the ‘Movie’ domain, the best method,

consisting in using only the polarities reported in Sen-

ticNet, obtains a high precision but only on a rela-

tively small part of the reviews since its coverage is

69.41%. The second and third best methods, Stanford

DM and SO-CAL, have a coverage of the same or-

der, 91.92% and 89.10% respectively, which are both

below Sental+’s coverage value, 99.44%.

All other methods are outperformed by Sental+,

by both criteria.

When recall is considered together with preci-

sion (cf. Table 4), our method obtains an F1 score of

83.08%, which is higher than the F1 score of the most

precise method, 80.67%, but short of Stanford DM,

which has the highest F1 score with 87.07%, while

SO-CAL is lower, at 79.43%.

Regarding the ‘Product’ domain,

a group of methods (AFINN, LIWC15, Opinion

Lexicon, Pattern.en and Semantria) achieves a preci-

sion close to 80% with a recall ranging from 57% to

Table 5: A comparison of our method (trained on the ‘Elec-

tronic’ domain of DRANZIERA) with the best methods

benchmarked by Ribeiro et al., when applied to the ‘Prod-

uct’ dataset. The highest score of each column is high-

lighted in boldface.

Method Precision Recall F1 score

AFINN 0.7869 0.6280 0.6985

LIWC15 0.7674 0.5645 0.6505

Opinion Lexicon 0.8082 0.6715 0.7335

Pattern.en 0.7571 0.5953 0.6665

Semantria 0.8159 0.5945 0.6878

SenticNet 0.6991 0.9748 0.8142

SO-CAL 0.7823 0.7152 0.7472

Stanford DM 0.6853 0.8028 0.7394

VADER 0.7705 0.7133 0.7408

Sental+ 0.7309 0.9956 0.8430

67%. SO-CAL and VADER score well in both preci-

sion and recall. In contrast to the ‘Movie’ domain,

SenticNet and Stanford DM achieve modest preci-

sions of less than 70%, compensated by high recall

values.

The precision of 73.09% scored by Sental+, (i.e.

better than 70%, as for the ‘Movie’ domain) and its

recall of 99.56% allows the method to obtain the best

F1 score. Overall, Sental+ appears to have a more

consistent performance. It also features an excellent

coverage, which is always above 99%, meaning that

less than 1% of the reviews are not classiﬁed.

4 CONCLUSION

We have proposed Sental+, an extension of the Sen-

tal method inspired by the MDFSA approach, which

incorporates an original solution to overcome a ma-

jor issue of Sental: a positive correction term in the

polarity propagation phase effectively counteracts the

negative bias of the reviews.

The resulting method has been validated using a

standard evaluation protocol, showing a signiﬁcant

improvement with respect to the state of the art. We

have also tested the cross-domain generalization ca-

pabilities of our approach with very promising results.

Injecting more linguistic background knowledge,

as Sental does (here, POS tagging and word embed-

ding) into the semantic graph appears to improve both

the precision and the coverage of the method. This

suggests focusing future work in this direction, for in-

stance by exploiting large language models.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

526

REFERENCES

Abdullah, N. A., Feizollah, A., Sulaiman, A., and Anuar,

N. B. (2019). Challenges and recommended solutions

in multi-source and multi-domain sentiment analysis.

IEEE Access, 7:144957–144971.

Baumeister, R. F., Bratslavsky, E., Finkenauer, C., and

Vohs, K. D. (2001). Bad is stronger than good. Review

of general psychology, 5(4):323–370.

Blitzer, J., Dredze, M., and Pereira, F. (2007). Biographies,

bollywood, boom-boxes and blenders: Domain adap-

tation for sentiment classiﬁcation. In ACL, pages 187–

205.

Cambria, E., Poria, S., Hazarika, D., and Kwok, K. (2018).

Senticnet 5: Discovering conceptual primitives for

sentiment analysis by means of context embeddings.

In AAAI, pages 1795–1802. AAAI Press.

Cambria, E., Speer, R., Havasi, C., and Hussain, A. (2010).

Senticnet: A publicly available semantic resource for

opinion mining. In AAAI Fall Symposium: Common-

sense Knowledge, volume FS-10-02 of AAAI Techni-

cal Report, pages 14–18. AAAI Press.

Casas, B., Hern

andez-Fern

andez, A., Catal

a, N., Ferrer-

i Cancho, R., and Baixeries, J. (2019). Polysemy

and brevity versus frequency in language. Computer

Speech & Language, 58:19–50.

Dragoni, M. (2015). Shellfbk: an information retrieval-

based system for multi-domain sentiment analysis. In

Proceedings of the 9th International Workshop on Se-

mantic Evaluation (SemEval 2015), pages 502–509.

Dragoni, M., Tettamanzi, A. G. B., and da Costa Pereira,

C. (2014). A fuzzy system for concept-level senti-

ment analysis. In SemWebEval@ESWC, volume 475

of Communications in Computer and Information Sci-

ence, pages 21–27. Springer.

Dragoni, M., Tettamanzi, A. G. B., and da Costa Pereira,

C. (2015). Propagating and aggregating fuzzy polar-

ities for concept-level sentiment analysis. Cognitive

Computation, 7(2):186–197.

Dragoni, M., Tettamanzi, A. G. B., and da Costa Pereira,

C. (2016). DRANZIERA: an evaluation protocol

for multi-domain opinion mining. In Proceedings of

the 10th International Conference on Language Re-

sources and Evaluation (LREC 2016), Paris, France,

pages 267–272, Paris, France. European Language

Resources Association (ELRA).

Hu, M. and Liu, B. (2004). Mining and summarizing

customer reviews. In Proceedings of the tenth ACM

SIGKDD international conference on Knowledge dis-

covery and data mining, pages 168–177.

Hutto, C. J. and Gilbert, E. (2014). Vader: A parsimonious

rule-based model for sentiment analysis of social me-

dia text. In Eighth international AAAI conference on

weblogs and social media.

Kirkpatrick, S., Jr., D. G., and Vecchi, M. P. (1983).

Optimization by simulated annealing. SCIENCE,

220(4598):671–680.

Lexalytics (2015). Sentiment extraction - measuring the

emotional tone of content. Technical Report.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).

Efﬁcient estimation of word representations in vector

space. arXiv preprint arXiv:1301.3781.

Miller, G. A. (1995). Wordnet: a lexical database for en-

glish. Communications of the ACM, 38(11):39–41.

Nielsen, F.

A. (2011). A new anew: Evaluation of a

word list for sentiment analysis in microblogs. arXiv

preprint arXiv:1103.2903.

Pasquier, C., da Costa Pereira, C., and Tettamanzi, A.

G. B. (2020). Extending a fuzzy polarity propaga-

tion method for multi-domain sentiment analysis with

word embedding and POS tagging. In ECAI, volume

325 of Frontiers in Artiﬁcial Intelligence and Applica-

tions, pages 2140–2147. IOS Press.

Pirnau, M. (2018). Sentiment analysis for the tweets that

contain the word “earthquake”. In 2018 10th Inter-

national Conference on Electronics, Computers and

Artiﬁcial Intelligence (ECAI), pages 1–6.

Rexha, A., Kr

oll, M., Dragoni, M., and Kern, R. (2018).

The CLAUSY system at ESWC-2018 challenge on se-

mantic sentiment analysis. In SemWebEval@ESWC,

volume 927 of Communications in Computer and In-

formation Science, pages 186–196. Springer.

Ribeiro, F. N., Ara

ujo, M., Gonc¸alves, P., Gonc¸alves, M. A.,

and Benevenuto, F. (2016). Sentibench-a benchmark

comparison of state-of-the-practice sentiment analysis

methods. EPJ Data Science, 5(1):23.

Schouten, K. and Frasincar, F. (2015). The beneﬁt of

concept-based features for sentiment analysis. In

SemWebEval@ESWC, volume 548 of Communica-

tions in Computer and Information Science, pages

223–233. Springer.

Smedt, T. D. and Daelemans, W. (2012). Pattern for

python. Journal of Machine Learning Research,

13(Jun):2063–2067.

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning,

C. D., Ng, A., and Potts, C. (2013). Recursive deep

models for semantic compositionality over a senti-

ment treebank. In Proceedings of the 2013 conference

on empirical methods in natural language processing,

pages 1631–1642.

Taboada, M., Brooke, J., Toﬁloski, M., Voll, K. D., and

Stede, M. (2011). Lexicon-based methods for senti-

ment analysis. Computational Linguistics, 37(2):267–

307.

Tausczik, Y. R. and Pennebaker, J. W. (2010). The psy-

chological meaning of words: Liwc and computerized

text analysis methods. Journal of language and social

psychology, 29(1):24–54.

Yoshida, Y., Hirao, T., Iwata, T., Nagata, M., and Mat-

sumoto, Y. (2011). Transfer learning for multiple-

domain sentiment analysis—identifying domain de-

pendent/independent word polarity. In AAAI, pages

1286–1291.

Propagation-Based Domain-Transferable Gradual Sentiment Analysis

527