AFFECTIVE BLOG ANALYZER
What People Feel to
Masato Tokuhisa, Jin’ichi Murakami and Satrou Ikehara
Department of Information and Electronics, Graduate School of Engineering
Tottori University, Koyama-Minami, Tottori, Japan
Keywords:
Affect, Emotion, Sentence pattern, Sentiment analysis, Web mining.
Abstract:
This paper proposes an affective blog analyzer which can capture people’s emotional targets. The existing
affective analysis has some problems. For instance, polarity analysis or positive/negative classification for
documents are developed, but emotional targets can not be extracted. Some investigations can capture cus-
tomer’s wanted/needed objects, but the knowledge is domain dependent. Therefore, it can not analyze people’s
everyday life. Against these problems, this paper uses a sentence pattern dictionary to analyze emotions. The
dictionary covers Japanese fundamental 6,000 verbs and contains 14,800 patterns with emotional information
for everyday life. This dictionary is available for analyzing the downloaded blog articles. After analyzing
blogs, many keywords can be extracted as emotional targets. In order to filter and sort them for supporting
blog analysts, two parameters are applied. One is Z-score in terms of the frequency of the target appearance,
and another is probability of emotions. In the experiments, trendy and emotional targets were successfully
extracted from 6-month-blogs. Thus, the effects of the patterns and parameters are confirmed.
1 INTRODUCTION
WWW is used by many people from children to elder
persons. Partially, blogs are popular among people to
describe their everyday life. Such documents contain
people’s favorites, interests, desires and hates. There-
fore, blogs are good resources for marketing analysts,
political decision makers, and so on.
In the previous studies on sentiment analysis and
web mining, a lot of ideas are proposed and realized.
On early studies of sentiment analysis, the positive
and negative semantic orientation/polarity of the con-
joined adjectives were focused on (Hatzivassiloglou
and McKeown, 1997). As the polarity of the adjec-
tives is a strong clues, the techniques for document
classification of reviews were developed by using ma-
chine learning (Turney, 2002). Recently, the attention
of the studies shifts and more functions are required.
Some investigations realize to extract the targets of
the polarity, for instance, customers’ needs/wants
(Kanayama and Nasukawa, 2008). In order to extract
the targets, pattern based method is effective. But the
patterns are developed with focusing on several spe-
cific domains like book-reviews, computer-products,
and so on. The knowledge base engineers avoided
the development of the knowledge base for everyday
life. One reason is that they believe such knowledge
base is futile for business on the engineering stand-
point. But recently web mining techniques are re-
quired from advertisement agents. Another reason is
that such knowledge base requires complex expres-
sions or meaning. For example, OCC model on the
cognitive science represents the structure of emotion
causality (Ortony et al., 1988). If text parser can ana-
lyze such structure from text, emotion reasoning will
succeed. A frame knowledge base was developed
and it worked on the virtual agents very well (Elliott,
1992). However, there is no pattern knowledge base
to parse and analyze natural language expressions.
There is a big gap between the frame knowledge and
natural language expressions, as semantic analysis is
not accomplished. Some investigations tried to ob-
tain such knowledge from Web or huge corpus with
focusing on the relation, SUBJECT-VERB-OBJECT-
OBJECT (Liu et al., 2003), (Tokuhisa et al., 2008).
This is the clue of the verbs. It is as important as
the clue of the adjectives. But they also have not
extracted the emotional targets yet. Since their ap-
proach depends on the OBJECT, if unknown keyword
is given to the OBJECT, there is no proof to analyze
emotions successfully. Against the problems, for in-
stance, emotion reasoning for everyday life and ex-
247
Tokuhisa M., Murakami J. and Ikehara S. (2010).
AFFECTIVE BLOG ANALYZER - What People Feel to.
In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Artificial Intelligence, pages 247-252
DOI: 10.5220/0002719602470252
Copyright
c
SciTePress
traction of emotional targets, this paper uses a sen-
tence pattern dictionary. The dictionary originates
from A-Japanese-Lexicon which covers Japanese ba-
sic 6,000 verbs and contains 14,800 sentence patterns
(Ikehara et al., 1997). Affective information is added
to the dictionary in order to infer emotions for every-
day experiences.
Next, for the purpose of supporting blog analysts,
this paper develop a blog analyzer based on the dictio-
nary. The blog analyzer contains both the blog crawl-
ing function and the emotion reasoning function. The
emotion reasoning outputs not only emotional cate-
gories but also emotional targets. The keyword of the
emotional target would be a great help to analyze the
interests/claims among people. In the experiment of
this paper, the emotional target extraction is demon-
strated.
2 EMOTIONAL PATTERN
DICTIONARY
2.1 How to Analyze Emotions from a
Sentence
Our basic idea to analyze emotions from a sentence
is the emotional feature extraction from text, in other
words, the confirmation of the emotional process. If
an input sentence represents a part of emotional pro-
cess, emotions are inferred form the sentence. The
emotional process consists of emotion arousal, emo-
tional state and emotional response. It is not easy to
define emotional state with separating from emotional
arousal/response on the cognitive science viewpoint,
but it is clear that emotional states are described in
natural language.
These are examples of the three:
Arousal: I left my wallet at the airport.
State: I disappointed myself.
Response: My tears fell down.
In the arousal process, there are some features, in
other words, “emotional cause. The causality is
explained by OCC model. But it was too abstract
to analyze natural language expressions. Therefore,
more detailed features are referred from (Tokuhisa
and Okada, 1997), which contains 36 features for
joy/sadness and 120 features for eight emotions in to-
tal. For example, “loss” is one of the features of sad-
ness. In contrast, “acquisition” is one of the features
of joy.
2.2 Constructing Emotional Pattern
Dictionary
There already exists a pattern dictionary (Ikehara
et al., 1997), which is developed for machine transla-
tion from Japanese to English. It covers fundamental
Japanese verbs and can distinguish word sense ambi-
guity by the valency grammar which is a kind of con-
straint for dependency among subject case, verb and
object case in a sentence.
Following is example
1
.
ex1) N1(person) lose N2(concrete object)
= feature “loss”
ex2) N1(person) lose N2(disease)
= feature “inner-pleasure”
Here, N1 and N2 are variables which match a
noun word or phrase. Since the meaning of the sen-
tence depends on the meaning of the variables, the
variables are restricted what expression matches.
In order to extend the dictionary for emotion rea-
soning, the meaning of the pattern was checked, and
emotional information is assigned. The informa-
tion slots are “emotional cause, “emotion category,
“feeler, and “feel-to. Figure 1 shows real samples
of the dictionary.
The emotion category consists of gladness, sad-
ness, liking, dislike, surprise, expectancy (hope), fear,
anger and non-emotion. These are named 9-category-
set in this paper.
Since tense and aspect are ignored to analyze emo-
tions in this paper, 9-category-set can not be dealt
with well. Therefore, 5-category-set and 3-category-
set are prepared. 5-category-set consists of P, N, A, S
and non-emotion.
P is union of gladness, liking and expectancy.
N is union of sadness and fear.
A is union of disliking and anger.
S is surprise.
3-category-set consists of positive, negative and non-
emotion.
Positive is P.
Negative is union of N and A.
Non-emotion of 3-category-set includes S.
This construction was hand made. It took 3 years
to add emotional information. The current version is
the 3rd. edition. The 2nd. edition was checked by 3
analysts. As the result, the error ratio was 14.2%. The
3rd. edition will be better.
1
While this dictionary is constructed for Japanese, these
examples are written in English in order to explain the sig-
nificant concept of the dictionary.
ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence
248
Japanese pattern: N1(agent)ga N2(*)wo
N3(agt.)ni N4(numeric)de
kau
English pattern: N1 buy N2 for N3 for N4
emotional cause: feature “acquisition”
emotion category: gladness
feeler: N1
feel-to: N2
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
emotional cause: feature “acquisition”
/ “hospitality”
emotion category: gladness / gladness
feeler: N3
feel-to: N2 / N1
Japanese pattern: N1(agt.)ga N2(obj.,abst.)wo
N3(agent,place)kara
ryakudatsu suru
English pattern: N1 plunder N2 of N3
emotional cause: feature “loss” / “cheating”
emotion category: sadness / anger
feeler: N3
feel-to: N2 / N1
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
emotional cause: feature “acquisition”
emotion category: gladness
feeler: N1
feel-to: N2
* The italic words are Japanese.
Figure 1: Samples of sentence patterns and their emotional
information.
3 AFFECTIVE BLOG ANALYZER
3.1 Components
Affective blog analyzer (ABLANA) is newly de-
veloped. Figure 2 shows the basic components of
ABLANA. This system monitors some blog sites by
referring the RSS of the blog sites, downloads new
articles from them, and then performs emotion rea-
soning by using the pattern dictionary. The results are
statistically analyzed and summarized.
3.2 Behavior
ABLANA watches RSS derived from blog sites to ex-
tract new articles with a time series. Therefore web
search engines are not used.
“Downloader” accesses to the blog site and down-
load the article informed by the RSS after waiting for
24 hours more. If the article is temporary or SPAM,
it would be removed. If the article is written well, it
RSS Watcher
URL List
Downloader
Blog Articles(HTML)
Text Extractor
Emotion Categories & Emotional Targets
Statistical Analyzer
The Internet
Blog Sites
Pattern
Dictionary
Blog TextsEmotion Reasoner
Emotional Targets
with Trend & Affective Parameters
output
input
ABLANA
Site
List
Figure 2: Components of ABLANA.
would receive some comments from the readers. In
this paper, the comments are not referred yet, but it
would be useful information to measure the reliabil-
ity of the articles.
“Text extractor” parses HTML sources and ex-
tracts blog body, author’s name, date, title of the ar-
ticle and so on. Because some people do not write
comma and period onto the text in Japanese blogs, the
extractor finds sentence terminal and splits sentences.
“Emotion reasoner” performs morphologic analy-
sis and pattern matching to the blog body texts. Then,
it selects the best pattern-matching according to the
constraint to the variables in patterns. As the results,
emotional information is obtained.
“Statistical analyzer” collects emotional targets
from the results and then assigns score to each of tar-
get for filtering and sorting the targets according to
some parameters.
3.3 Parameters for Emotional Targets
The requirements to the statistical analyzer are fol-
lowing:
(a) to extract something trend among people, and
(b) to evaluate their polarity.
The requirement (a) is measured by the burst of
keyword. In this paper, Z-score z
k,i
is used to capture
it. Z-score is described as the following equation.
z
k,i
= (x
k,i
m
k
)/σ
k
(1)
AFFECTIVE BLOG ANALYZER - What People Feel to
249
m
k
=
jI
x
k, j
/N
I
(2)
σ
k
=
r
jI
(x
k, j
m
k
)
2
/N
I
(3)
I is a set of intervals (In this paper, one interval is
one week, and whole intervals are about six months).
k is the keyword to be scored. i is one of the intervals
to be analyzed. x
k,i
is the frequency of the appearance
of the keyword k as the emotional target during the
interval i. m
k
is the mean of the x
k, j
, here j is each
element of I. N
I
is the number of elements of I. σ
k
is
the standard deviation.
The requirement (b) is calculated by the probabil-
ity P(e|k, i) of the appearance of emotional category e
after the keyword k appeared on the focusing interval
i. The equation for it is as follows:
P(e|k,i) = x
k,i,e
/x
k,i
(4)
4 EXPERIMENTS OF
EMOTIONAL TARGET
EXTRACTION FROM BLOGS
4.1 Terms and Amount
ABLANA ran from August 1st 2008 to January
31st 2009. From three Japanese major blog
sites, 7,120,992 articles(105,167,276 sentences) were
downloaded. The emotion reasoner spent about 10
days to process one-month-articles. The file size is
53 GB a month, which includes blog texts, pattern-
matching results, and emotional targets.
4.2 Experiment-1: Basic Performance
The basic performance of the emotion reasoning is
evaluated by the accuracy A. It is calculated by A =
2N(o c)/(N(o) + N(c)). Some analysts annotate
emotional tags to test sentences. The results of emo-
tion reasoning are compared with the tags. N(c) is the
number of tags by the analysts. N(o) is the number of
tags by ABLANA. N(o c) is the number of tags cor-
responding between the analyst and ABLANA.
304 sentences are extracted from blog texts by
random. 5 persons annotated 9-category emotional
tags to these sentences. Table 1 shows the accuracy
of the emotion reasoning. The column of HUMAN
shows the difficulty of the emotion reasoning. On 9-
category-set, the accuracy of ABLANA is lower than
that of HUMAN. But on 5- and 3-category-sets, the
Table 1: Accuracy of emotion reasoning.
E-Categories ABLANA HUMAN
9 0.375 0.513
5 0.592 0.566
3 0.685 0.618
ABLANAs accuracy is closed to HUMAN’s. There-
fore, in this paper, 3-category-set is used to the prob-
ability scoring.
4.3 Experiment-2: Observation with
Keyword
The aim of this experiment is to confirm the abil-
ity of Z-score and emotion probability. Before us-
ing Z-score to select trend keyword as the main pur-
pose, in this section a prominent keyword is given to
ABLANA and the score is observed along a time se-
ries.
This experiment is a top-down observation ap-
proach. For instance, we give a known keyword and
check whether a known event can be captured or not.
There was a lot of big change in 2008. In the do-
main of the motor sports, “HONDA Racing F1 Team
exited Formula One on December 5th.
A keyword “HONDA” repeatedly appeared whole
intervals, and the burst appeared twice as shown
in Figure 3. This keyword appeared approximately
19,000 times and 4.68% of the appearance was emo-
tional target. The first burst corresponds to the exit-
ing news. Some news articles about the exiting were
cited on blogs and blog authors wrote comments.
The second burst corresponds to another exiting news
“HONDA exited eight hours endurance motorbike
race on January 23rd 2009.
Z-score becomes high at the burst. Therefore Z-
score has feasibility to find a significant event.
The right Y axis of Figure 3 shows the probabil-
ity of P(POSITIVE|“HONDA,i) along a time series.
The exiting news were not good for motor sports fun,
but the P
positive
was over 50%. Reading the blogs, it
was clarified that blog authors expressed their love to
HONDA.
On the other hand, P
positive
became minimum on
November21st 2008 and January 9th 2009. The blogs
on these days said as following:
a negotiation episode at a HONDA dealer includ-
ing customer’s distrust
a decrease in production of a HONDA factory
early history of HONDA including an expose
As the results, the probability leads to appropriate
articles for a blog analyst successfully.
ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence
250
0
500
1000
1500
2000
2500
3000
3500
4000
Feb/01/2009Dec/01/2008Oct/01/2008Aug/01/2008
0
20
40
60
80
100
Frequency
Probability[%]
Term
Freq. as Emotional Target
Freq. at Any Places
Prob. of Positive Emotion
X axis is term (from Aug. 1st 2008 to Jan. 31st 2009).
The circlet-plots are the appearance as an emotional target,
and the rectangle-plots are the appearance at any places in
the blogs(left Y axis). The X-plots are the probability of
positive emotion to the target(right Y axis).
Figure 3: Frequency of appearance “HONDA and Proba-
bility of its positive emotion.
Table 2: High positive targets.
# Z P
pos.
Emotional target
1 2.7 98.6 a good fight
2 2.0 98.2 kart
3 4.8 97.8 Olympic opening ceremony
4 4.8 97.6 large fonts
5 4.8 97.1 medal competitions
6 3.1 97.1 corridor
7 4.8 96.9 the anchor of the Olympic
sacred-fire relay
8 4.8 96.9 Industrial Hi-School of Naruto
9 3.9 96.8 resting
10 4.8 96.8 Secretary Romaiya
4.4 Experiment-3: Observation with
Z-score and Probability
Emotional targets can be filtered by the Z-score and
listed in the order of emotional probability each in-
terval. For example, Table 2 and 3 show emotional
targets whose Z-score is greater than 2.0 from Aug
8th. 2008 to 14th. These keywords are intuitively
correct. People seem to have enjoyed Olympic and
disappointed at the loss of the games. By the way,
there are high schools in the lists. Because there is an
annual baseball tournament for high school students
in Japan, they are some of the teams.
On the other hand, emotional targets are filtered
by the lower Z-score condition (0.5 < z < 2.0). Ta-
ble 4 shows the targets. Some targets are useful and
the probability looks appropriate. We can see sev-
eral categories from the targets. For instance, sports,
Table 3: Low positive targets.
# Z P
pos.
Emotional target
1 4.8 2.2 weapon
2 4.7 2.5 Kagoshima Jitsugyou Hi-School
3 4.6 3.2 soaring prices
4 4.4 3.9 professionalism
5 4.2 4.4 Olympic baseball; Japan, Cuba
6 3.3 5.4 ten
7 2.5 5.6 ruddy
8 4.8 5.6 Olympic soccer; Japan, Nigeria
9 4.5 6.0 job
10 2.2 6.3 discussion
amusements, communication and health are found in
the high positive groups, and disease and politics are
found in the low positive groups.
Aso” is a politician in Japan. His name appeared
in both the group of 80% – 90% and the group of 0%
– 10%. We can understand that the opinion for him is
split. Of course, we must read and analyze the blogs
to conclude this idea. But we can say that the table
leads us what blogs to be read. This process is a good
example for how to use ABLANA.
As the results, the combination of Z-score and
emotional probability would be useful parameters to
find people’s requirements and complaints.
5 OPEN PROBLEMS
The soundness of the patterns and parameters pro-
posed in this paper is confirmed well by the experi-
ments. However there remain some problems to im-
prove the accuracy of emotion reasoning.
One of the important requirements is to deal with
the plural sentences. For example, Beaujolais Nou-
veau’s Release. It tastes different every year. This
year’s wine is fresh and flavors... has three sen-
tences. The second and third sentences do not include
“Beaujolais Nouveau”. The proposed method can not
extract “Beaujolais Nouveau” as the emotional target
from the two.
Moreover, because “release, “fresh” and “fla-
vors” are positive words and tastes different” is a
negative phrase, the probability of emotion for this ar-
ticle is marked a little negative. The second sentence
describe just a common sense, but it is not complain.
Therefore, our future works are to analyze
anaphora and to merge the emotional states from plu-
ral sentences.
AFFECTIVE BLOG ANALYZER - What People Feel to
251
Table 4: Emotional targets found in the middle Z-score set.
Range of P
pos.
/ Sample of emotional targets
90% – 100%/ company, foreign exchange, accident,
lunch, newest information, image, coffin, knowledge,
going to sleep, customer, cover, lottery, Yakisoba(fried
noodles), new mail, salary
80% – 90%/ rest, musical, garlic, house, animation,
bumber ticket, ambulance car, dam, soft cream, ticket,
actual place, plan, Mr. Aso, Soumen(fine noodles), 3 days,
revision, high school basebale tournament
70% – 80%/ chance, Gyoza(pot sticker), Obon
(Japanese Summer holidays), clothes, reference book,
sheep, access counts, proposal, boyfriend/girlfriend, son,
once, Japanese people, stairs, guys, past questions, movies
60% – 70%/ cloud, characteristics, senior, process,
short, street stall, milk, potato, world view, flower langu-
age, high school days, panel, course, room, Mr. Kouichi,
response, joy, park, Tokyo, tombstone, TVCM, RPG
50% – 60%/ helmet, boys, Japanese, lecture, environ-
ment problem, curren, strain (feeling), water place, safe
management, game, Yankees, vanilla, category, climbing
Mt. Fuji, Beawanpi(one-piece dress), expected software,
40% – 50%/ fear, triathlon, batted ball, immediately
after, sitting comfort, Osaka Touin(school), young people,
ear, mood, Shun(person), dark, whole life, love, future,
meal time, driving, competition, ultraviolet rays
30% – 40%/ obligation, corner, leukemia, prejudice,
under construction, terrorist, feeling of intimacy, Hikari
(light, fiber-optic cable), natural environment, belly,
member on the regular payroll, symptoms, oneself
20% – 30%/ abnormal weather, contents of work,
darkness, otherwise, fatigue, narrow, crowdedness, life, SAP
for maker, love, elegant, property management, shooting
(film), a liitle happy, aphthous ulcer, drawback, husband
10% – 20%/ HIV, bewilderment, muddy, allow, noon,
twice or more, provery of blood, marriage, put away, 30,
your body, a hip joint, load (of baggage), shame, headache,
next election of Presentative, scarcity of water, hallucination
0% – 10%/ Furafura(dizzily), Japan High School
Baseball Federation, molar tooth, panic, a chief secretary-
Aso Taro, continuation, breast cancer, camp, sleeplessness,
respect, strained back, ant, sleeping posture
* Extracted from the interval Aug. 1st - 7th 2008. Trans-
lated into English by the author of this paper.
6 CONCLUSIONS
This paper proposed an affective blog analyzer
(ABLANA) which crawls blog articles along a time
series from the Web and analyzes people’s emotional
targets. The method for emotion reasoning uses a sen-
tence pattern dictionary. The original dictionary is
A-Japanese-Lexicon. It covers Japanese fundamental
6,000 verbs and consists of 14,800 patterns. In this
paper, the extended dictionary is used, where emo-
tional information is annotated if pattern expresses
emotional processes (arousal, state and response).
In the experiments, the extracted emotional tar-
gets can be filtered and sorted by two parameters,
for instance, the Z-score in terms of the frequency of
the keyword appearance and the probability of emo-
tions. These parameters are so effective that trendy
and emotional targets can be captured. Thus, the do-
main independent affective analyzer is successfully
constructed. It is expected to practically apply this
technique to capture people’s affective statements.
REFERENCES
Elliott, C. (1992). The Affective Reasoner: A process model
of emotions in a multi-agent system. PhD thesis,
Northwestern University.
Hatzivassiloglou, V. and McKeown, K. R. (1997). Predict-
ing the semantic orientation of adjectives. In Pro-
ceedings of the Annual Meeting of the Association for
Computational Linguistics, pages 174–181.
Ikehara, S., Miyazaki, M., Shirai, S., Yokoo, A., Nakaiwa,
H., Ogura, K., Ooyama, Y., and Hayashi, Y. (1997).
Goi-Taikei: A Japanese Lexicon. Iwanami Shoten.
Kanayama, H. and Nasukawa, T. (2008). Textual demand
analysis: Detection of users’ wants and needs from
opinions. In Proceedings of the International Confer-
ence on Computational Linguistics, pages 409–416.
Liu, H., Liberman, H., and Selker, T. (2003). A model of
textual affect sensing using real-world knowledge. In
Proceeding of the International Conference on Intelli-
gent User Interfaces, pages 125–132.
Ortony, A., Clore, G. L., and Collins, A. (1988). The Cog-
nitive Structure of Emotions. Cambridge Univ. Press.
Tokuhisa, M. and Okada, N. (1997). A conceptual analy-
sis of emotional words for an intellectual, emotional
agent. In Proc. of the Int. Conf. on Pacific Association
for Computational Linguistics, pages 307–315.
Tokuhisa, R., Inui, K., and Matsumoto, Y. (2008). Emotion
classification using massive examples extracted from
the web. In Proceedings of the International Confer-
ence on Computational Linguistics, pages 881–888.
Turney, P. (2002). Thumbs up or thumbs down? seman-
tic orientation applied to unsupervised classification
of reviews. In Proc. of the Ann. Meeting of the Associ-
ation for Computational Linguistics, pages 417–424.
ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence
252