TRENDSPOTTER DETECTION SYSTEM FOR TWITTER

Wataru Shirakihara

Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka City, Japan

Tetsuya Oishi

†

, Ryuzo Hasegawa

‡

, Hiroshi Hujita

‡

, Miyuki Koshimura

‡

†

Research Institute for Information Technology, Kyushu University, Fukuoka City, Japan

‡

Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka City, Japan

Keywords:

Twitter, Data stream algorithms, Information recommendation.

Abstract:

It is too difﬁcult for us to ﬁnd out trends with search engines. Twitter, a popular microblogging tool, has seen

a lot of growth since it launched in October, 2006. Information about the trends is posted by many twitterers.

If we ﬁnd out trendspotters from twitterers, and follow them, we can get it more easily. Our trendspotter

detection system uses the burst detection algorithm, and we veriﬁed its effectiveness for Twitter’s posts. We

succeeded in detecting the 24 trendspotters by 5277 users.

1 INTRODUCTION

Over the past few decades Internet has developed

rapidly. Many people use Internet when they want

to get some pieces of information, but information we

ﬁnd on the Internet is out of date unless it is renewed.

Therefore it is too difﬁcult for us to ﬁnd out trends

with search engines.

Twitter, a popular microblogging tool, has seen a

lot of growth since it launched in October, 2006(Java

et al., 2007). According to Netratings, in Japan, Twit-

ter had about 4,730,000 users as of January, 2010

However, the ﬁgures include only the users who use

Twitter from Twitter’s website

, that is to say, the

users from mobile phone, or Twitter client are not in-

cluded. To sum up, actually, there are much more

users(Kanda, 2009).

Twitter has a lot of differentiating good factors

from other SNS or blogging, and is used by many

users as an area of exchange of information.

Users (Twitterers) can broadcast an unlimited

amount of messages (tweets) to a group of other Twit-

terers who have opted to subscribe to these broadcasts

(followers). Twitterers also recieve broadcasts from

other users. Individual tweets are made within a limit

of 140 characters(Starbird and Palen, 2010).

http://www.netratings.co.jp/

New news/News02242010.htm

http://twitter.com

Twitter has four main characteristics. The ﬁrst is

that twitterers can ’retweet’ someone else’s post by

copying the post and the person’s username. The

retweeted post is shared with all of their followers.

The second is that the hashtag convention(#[hash-

tag term]) is used inline to call out user-chosen key-

words. Hashtags tag or markup a tweet to help oth-

ers understand the content context, as well as support

keyword term-searching.

The third is that the posts may be directed to a par-

ticular person by putting an @username at the bigin-

ning of the post. Even though the post is directed to

a person, others can still view it(Ehrlich and Shami,

2010).

The fourth is that twitterers can use Twitter’s API.

Many client tools were invented with API. API allows

other web services to integrate with Twitter. Buzzt-

ter

, one of the Twitter’s web services, shows buzz

terms in Twitter.

In Twitter, because of its characteristics, informa-

tion about the trends is posted by many twitterers. If

we ﬁnd out trendspotters from twitterers, and follow

them, we can get it more easily. The purpose of this

paper is to develop a system that detects the trendspot-

ters.

For detecting trendspotters, our system uses burst

detection algorithm(Fujiki et al., 2004). In blog-

ging or bulletin board, a particular phrase appears fre-

http://buzztter.com

625

Shirakihara W., Oishi T., Hasegawa R., Hujita H. and Koshimura M..

TRENDSPOTTER DETECTION SYSTEM FOR TWITTER.

DOI: 10.5220/0003183806250628

In Proceedings of the 3rd International Conference on Agents and Artiﬁcial Intelligence (ICAART-2011), pages 625-628

ISBN: 978-989-8425-40-9

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

quently when certain topic is focused. The reason for

this phenomenon is that a relevant proper name in the

topic comes up in many users posts. The burst detec-

tion algorithm, proposed by Kleinberg, the posts are

treated as the document stream (Kleinberg, 2003).

The document stream is a set of documents with

the time it’s posted, such as newspaper articles. This

algorithm detects the time when the number of docu-

ments are sharply increased.

The paper is organized as follows: in section 2 ,

we discribe the practical move of this system. Next,

in Section 3 we discribe the result the system gets,

and consider it. Additionally, we determine that in-

formation about the trends will still be posted by the

trendspotters even after they were found out. Finally,

we conclude and discribe the future view in Section

2 SYSTEM OVERVIEW

Our system is overviewed as follows.

1. The system gets a buzz word s from Buzztter.

Buzztter shows different kinds of buzz words, in

which some everyday talks such as ”Good morn-

ing” or ”lunch” are included. However, these ev-

eryday talks actually are not buzz words. There-

fore, we ignore them and pick out the other buzz

words.

2. Search the posts which include the buzz word s

with Twitter’s API, and get usernames and posted

time.

3. Detect the time when the posts including buzz

word s are sharply increased with the burst detec-

tion algorithm.

4. Extract the earliest 20 posts in detected burst, and

the twitterers who posted them are recognized as

the trendspotters for buzz word s.

5. Repeat this sequence and pick out the trendspot-

ters who posted a signiﬁcant number of buzz

words.

3 EXPERIMENTS

We evaluate our system by three experiments as fol-

lows.

1. Veriﬁcation of Effectiveness (section 3.1)

We verify whether the burst detection algorithm is

effective for Twitter’s posts.

A number of posts

Time

1:40～

～～

～2:00, January 27

Figure 1: The number of posts including the buzz word 1.

A number of posts

Time

9:00～

～～

～9:20, January 26

18:00～

～～

～18:20, January 26

9:40～

～～

～10:00, January 26

Figure 2: The number of posts including the buzz word 2.

2. Detecting Trendspotters (section 3.2)

We examine whether our system surely gets

trendspotters who posted a signiﬁcant number of

buzz words.

3. Trendspotter’s Potency (section 3.3)

We verify whether the trendspotters we got in sec-

tion 3.2 will still be trendspotters even after they

are found out.

3.1 Veriﬁcation of Effectivness

To verify the effectiveness of burst detection, we

choose three example buzz words and discribe the re-

sults that the burst detection algorithm outputs.

Figure 1 - ﬁgure 3 shows a graph with a number of

posts including the buzz word on the y-axis, and the

time on the x-axis.

Our system’s outputs are as follows.

• In ﬁgure 1, the time after 2:00, January 27th was

burst.

• In ﬁgure 2, there are two distinct bursts. The ﬁrst

is seen from 9:00 to 10:00, and the second is after

18:00.

• In ﬁgure 3, there are also two bursts. The ﬁrst

is seen from 21:20 January 31th to 1:40 February

1st, and the second is after 7:00 February 1st.

The conclusion of these experiments is as follows.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

626

A number of posts

Time

1:20～

～～

～1:40, February 1

21:20～

～～

～21:40, January 31

7:00～

～～

～7:20, February 1

Figure 3: The number of posts including the buzz word 3.

• The burst detection algorithm recognizes the

sharp increase of the posts as a burst.

• It does not recognize the little increase of the posts

as a burst.

• When the number of posts increased and de-

creased in a short time, it recognizes them as the

same burst.

• It can detect the bursts from data in the long time

period.

From these considerations, it is clear that the burst

detection algorithm is effective for Twitter’s posts.

3.2 Detecting Trendspotters

We examine whether our system surely gets

trendspotters who posted a signiﬁcant number of buzz

words. This experiment’s procedure is as follows.

• Pick out 200 buzz words from Buzztter.

• Search the posts which include buzz words with

Twitter’s API, and get usernames and posted time,

up to 1500 for each buzz word.

• Experimental period is from January, 2010 to

February, 2010.

• Detect the bursts for these buzz words by burst

detection algorithm. Then extract the twitterers

who posted the earliest 20 posts in detected burst.

• We examine how many buzz words each

trendspotter posted.

Our system detected 5277 trendspotters. Table 1

shows a number of trendspotters for N buzz words and

percentage of 5277 trendspotters.

This table shows that trendspotters for more than

three buzz words were 1.52 of the total. Thus, our sys-

tem detected the trendspotters for many buzz words

by a lot of twitterers.

Table 1: The number of trendspotters for N buzz words and

percentage of total.

buzz words(N) number percentage(%)

1 4734 89.71

2 463 8.77

3 56 1.06

4 12 0.23

5 9 0.17

6 3 0.06

3.3 Trendspotter’s Potency

Our system detected the trendspotters. Now we verify

whether they will still be the trendspotters even after

they are found out.

• Pick out 110 buzz words from Buzztter.

• Search the posts which include buzz words with

Twitter’s API, and get usernames and posted time,

up to 1500 for each buzz words.

• Experimental period is from February 5th, 2010

to February 7th, 2010.

• Detect the bursts for these buzz words by the burst

detection algorithm. Then extract the twitterers

who posted in detected bursts.

• We examine how many buzz words each twitterer

posted in detected bursts, and compare the aver-

age of all twitterers with the average of the top 24

trendspotters in experiment2.

Our system detected 34381 twitterers. Table 2

shows the number of twitterers who posted N buzz

words in bursts, and percentage of 34381 twitterers.

Table 2: The number of users who posted N buzz words in

bursts and percentage of total.

buzz words(N) number percentage(%)

‘2 31174 90.67

3‘5 2916 8.48

6‘8 227 0.66

9‘11 46 0.13

12‘ 18 0.05

Table 3 shows the number of top 24 twitterers who

posted N buzz words in bursts, and percentage of 24

trendspotters.

Comparing the table 2 with the table 3, top 24

trendspotters posted more buzz words than the other

twitterers. In all twitterers, the percentage of twitter-

ers who posted more than 6 buzz words is less than

1%. On the other hand, in the top 24 trendspotters,

it is 33.4%. Additionally, the average buzz words

TRENDSPOTTER DETECTION SYSTEM FOR TWITTER

627

Table 3: The number of the top 24 trendspotters who posted

N buzz words in bursts and percentage of total.

buzz words(N) number percentage(%)

‘2 8 33.3

3‘5 8 33.3

6‘8 4 16.7

9‘11 3 12.5

12‘ 1 4.2

per all twitterers who posted the buzz words in the

bursts is 1.43, while the average buzz words per top

24 trendspotters is 4.54. Thus, the trendspotters our

system detected still be the trendspotters even after

they are found out.

4 CONCLUSIONS

In this paper, we have presented a system that detects

the trendspotters, and enables us to easily ﬁnd out the

trends with Twitter. We found out that the burst de-

tection algorithm is applicable to Twitter’s posts. In

addition, it appropriately recognized a sharp increase

of the posts as a burst. Therefore, the burst detec-

tion algorithm is suitable to detect the trendspotters.

Moreover, the trendspotters our system detected still

post more buzz words than the other twitterers even

after they are found out. Thus, our system is effective

to ﬁnd out the trends.

In future research, we are going to develop the

method of categorizing the trendspotters. Twitterers

are divided into the clusters such as politics, art, and

sports. Then, the system detects the trendspotters in

each clusters. However, the rules of clustering are

needed.

For ﬁnding these rules, we will work out Twitter’s

text mining. In particular, we take into account the

following 6 points.

• Connection of ”follow”

• Reply (the posts directed to a particular person by

putting an @username)

• Hashtags

• Posted time

• Distance between the sentenses in different posts.

• Bursts

Figure 4 shows an overview of the system. For

calculating them, we use Hadoop, a framework for

distributed processing. Then, we use HBase and Cas-

sandra for storing calculated results.

Hadoop

Twitter

Timelines

Buzztter

Buzz words

Timelines

Outputs

Learning

Hadoop

Choose a