Real-time Local Topic Extraction using Density-based Adaptive

Spatiotemporal Clustering for Enhancing Local Situation Awareness

Tatsuhiro Sakai, Keiichi Tamura, Shota Kotozaki, Tsubasa Hayashida and Hajime Kitakami

Graduate School of Information Sciences, Hiroshima City University, Hiroshima, Japan

Keywords:

Spatiotemporal Analysis, Geotagged Tweets, Local Topic Extraction, Social Data Mining, Big Data Analysis,

Spatiotemporal Clustering.

Abstract:

In the era of big data, we are witnessing the rapid growth of a new type of information source. In particular,

tweets are one of the most widely used microblogging services for situation awareness during emergencies.

In our previous work, we focused on geotagged tweets posted on Twitter that included location information

as well as a time and text message. We previously developed a real-time analysis system using the (ε,τ)-

density-based adaptive spatiotemporal clustering algorithm to analyze local topics and events. The proposed

spatiotemporal analysis system successfully detects emerging bursty areas in which geotagged tweets related

to observed topics are posted actively; however the system is tailor-made and specialized for a particular ob-

served topic, therefore, it cannot identify other topics. To address this issue, we propose a new real-time

spatiotemporal analysis system for enhancing local situation awareness using a density-based adaptive spa-

tiotemporal clustering algorithm. In the proposed system, local bursty keywords are extracted and their bursty

areas are identiﬁed. We evaluated the proposed system using actual real world topics related to weather in

Japan. Experimental results show that the proposed system can extract local topics and events.

1 INTRODUCTION

One of the most interesting emerging topics in big

data analysis of social media is that social data posted

on social media can be applied to situation awareness

during real world topics and events (Yin et al., 2012).

In particular, during natural disasters such as earth-

quakes, typhoons, ﬂoods, and heavy snow storms,

people actively post messages that mention the situ-

ations they are facing through social media sites. This

trend has been encouraged by the increasing popu-

larity of a new type of data on social media: geo-

annotated social data, which is also referred to as geo-

referenced social data (Naaman, 2011). Moreover, it

creates new technical challenges, such as how to show

where and when events occur. These new techniques

help users to better understand their local situation.

In our previous work we focused on geotagged

tweets posted on Twitter that included location infor-

mation as well as a time and text message. Geotagged

tweets are referred to as spatiotemporal documents

because we can analysis topics and events spatiotem-

porally using them. We proposed a spatiotempo-

ral clustering algorithm called the (ε,τ)-density-based

adaptive spatiotemporal clustering algorithm that al-

lows us to extract spatiotemporal clusters in which

geotagged tweets are actively posted. Moreover, we

developed a real-time analysis system using the (ε,τ)-

density-based adaptive spatiotemporal clustering al-

gorithm to analyze local topics and events. The pro-

posed spatiotemporal analysis system is successful in

detecting emerging bursty areas in which geotagged

tweets related to observed topics are posted actively.

The real-time analysis system proposed in our

previous work allows us to enhance local situation

awareness; however, the system requires the key-

words related to the observed topics to be speciﬁed in

advance. If the system is specialized for a particular

observed topic, it cannot identify other topics, even

though some local bursty keywords related to emerg-

ing local topics and events are posted around users.

To address this issue, we propose a new real-time spa-

tiotemporal analysis system for enhancing local situ-

ation awareness. This method is based on a density-

based adaptive spatiotemporal clustering algorithm.

Our new real-time spatiotemporal analysis system

is composed of two techniques: quartile-based out-

lier detection to identify bursty local keywords and

density-based adaptive spatiotemporal clustering to

identify bursty local areas. In our new system, lo-

Sakai, T., Tamura, K., Kotozaki, S., Hayashida, T. and Kitakami, H..

Real-time Local Topic Extraction using Density-based Adaptive Spatiotemporal Clustering for Enhancing Local Situation Awareness.

In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - Volume 1: KDIR, pages 203-210

ISBN: 978-989-758-158-8

203

cally frequent keywords in geotagged tweets that are

within a particular distance from a user are ﬁrst ex-

tracted. To determine whether local frequent key-

words are bursty keywords or routine keywords, we

utilize quartile-based outlier detection (Hyndman and

Fan, 1996). Moreover, bursty local areas related to

extracted local bursty keywords are identiﬁed using

density-based adaptive spatiotemporal clustering.

The remainders of this paper is organized as fol-

lows. In Section 2, related work is reviewed. In Sec-

tion 3, we propose a new real time spatiotemporal

analysis system. In Section 4, we explain the method

for detecting bursty local keywords. In Section 5,

the (ε,τ)-density-based adaptive spatiotemporal clus-

tering algorithm is described brieﬂy. In Section 6, ex-

perimental results and case studies are reported. In

Section 7, we conclude this paper.

2 RELATED WORK

In the era of big data, social media is expected to

enhance the situation awareness of local topics. In

particular, many researchers focus on natural disas-

ters and have developed awareness systems for nat-

ural disasters such as earthquakes, typhoons, ﬂoods,

and diseases (Yin et al., 2012). During natural dis-

asters, users often post text messages through social

media about things that they are witnessing(Hui et al.,

2012), (Kreiner et al., 2013), (Mendoza et al., 2010).

Some of the most successful proposals concern

crisis management systems for earthquakes, ﬂoods,

and epidemics. Sakaki et al. (Sakaki et al., 2010)

focused on a method for predicting earthquake epi-

centers by using geotagged tweets regarding earth-

quakes. Avvenuti et al. (Avvenuti et al., 2014) de-

veloped an earthquake alert and report system that can

identify damage in earthquake-affectedareas. Vieweg

et al. (Vieweg et al., 2010) showed that information

related to emergency situations is posted on Twitter

during emergencies such as ﬂoods and ﬁres. More-

over, Hwang et al. (Hwang et al., 2013) observed ﬂu

epidemics using a spatiotemporal analysis of social

media geostreams.

Kim et al. (Kim et al., 2011) introduced mTrend,

which constructs and visualizes spatiotemporal topic

trends, referred to as “topic movements.” mTrend is

not a tailor-made system; however, it cannot analyze

bursty areas of local topics and events. Thom et al.

(Thom et al., 2012) presented a system that extracts

anomalies from geolocated Twitter messages and vi-

sualizes them using term interactive clouds. This sys-

tem does not address spatiotemporal analysis. Ku-

mar et al. (Kumar et al., 2014) detected road hazards

Geotagged

Database

Twitter Site

Local Bursty

Keyword Extraction

Manager

newly added

geotagged tweets

Spatial Clustering

Manager

browsing

results

Web Browsers, Android App

Geotagged

Crawler

Map-based Web

Interface

local bursty

keywords

bursty areas of

local bursty keywrods

Figure 1: System Overview.

by aggregating hazard-related information posted by

Twitter users. This system is was tailor-made and it

could not extract any hazards other than road-related

topics.

3 SYSTEM OVERVIEW

In this section, we present an overview proposed real-

time spatiotemporal analysis system.

3.1 Sequence Geotagged Tweets

In this study, we focus on geotagged tweets

posted on Twitter. Let gt

denote the i-th geo-

tagged tweet in a set of geotagged tweets SGT =

{gt

,gt

,··· ,gt

}; then, gt

consists of four items:

=< text

, pt

, pl

, photo

>, where text

is a short

text message, pt

is the time when the geotagged tweet

was posted, pl

is the location where gt

was posted or

is located (i.e., latitude and longitude), and photo

an attached photo.

3.2 Components

In our system, spatiotemporal clusters of local bursty

keywords related to topics and events are extracted

as bursty areas in real time. Moreover, to visualize

bursty areas, our system provides a map-based user

interface. There are four managers in our system (Fig-

ure 1).

• The Geotagged Tweet Crawler crawls geotagged

tweets from Twitter feed via its Streaming APIs.

Geotagged tweets are stored in a geotagged tweet

database.

• The Local Bursty Keyword Extraction Manager

generates a sequence of geotagged tweets

that are located within r distance of a user.

This sequence is stored in SGT. First, this

manager extracts locally frequent keywords

KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval

204

Geotagged Tweets including Focusing Local Bursty Keyword

Map

d T

Ranking of Local

Bursty Keywords

Marker: geotagged tweet

Wow! It is snowing in downtown

Pop-up Window:

geotagged tweet’s message

Figure 2: Interfaces.

LFK

(c)

= {l fk

(c)

,l fk

(c)

,··· ,l fk

(c)

} from

SGT

(c)

, where SGT

(c)

= {gt

|gt

∈ SGT and pt

within one hour of the current time}.

It then extract local bursty keywords

LBK

(c)

= {lbk

(c)

,lbk

(c)

,··· lbk

(c)

} from LFK

(c)

using quartile-based outlier detection. The details

of this manager are further explained in Section

• The Spatial Clustering Manager identiﬁes bursty

local areas as spatial clusters using (ε,τ)-density-

based adaptive spatiotemporal clustering for each

local bursty keyword. For each lbk

(c)

∈ LBK

(c)

(ε,τ)-density-based adaptive spatiotemporal clus-

ters are extracted from SKGT

(key)

= {gt

|gt

∈

SGT and key∈ text

}, where key = lbk

(c)

. The de-

tails of this manage are further explained in Sec-

tion 5.

• The Map-based Web Inter face provides the user

interface for browsing extracted spatial clusters.

3.3 User Interface

Figure 2 shows a screen shot of the Map-

based Web Interface which is an Android applica-

tion. There is a map, a list of geotagged tweets of

a current focus local bursty keyword, and a ranking

of local bursty keywords on the Android application

interface. For each local bursty keyword, bursty lo-

cal areas and each geotagged tweets in extracted spa-

tiotemporal clusters within a particular distance from

a user are shown on the Google map. A marker repre-

sents a geotagged tweet and a pop-up window appears

when a user touches the marker.

4 LOCAL BURSTY KEYWORD

EXTRACTION

We extract local bursty keywords from locally fre-

quent keywords that appear frequently in geotagged

tweets near a user. There are many routine keywords

(e.g., greetings) in the set of locally frequent key-

words. Therefore, we have to remove routine key-

words from the set of local bursty keywords. For

example, suppose that a set of frequent keywords is

{“good morning,” “heavy rainfall,” “delay”}. The

keyword “good morning” is a routine keyword that

appears frequently everyday. Therefore, “good morn-

ing” is regarded as a routine keyword.

It is difﬁcult to determine whether a keyword is

a routine keyword or not because the frequencies of

keywords change dynamically depending on time and

location. To determine if locally frequent keywords

are routine keywords, we utilize the quartile-based

outlier detection. The local bursty keyword extraction

comprises two steps: (1) keyword frequency counting

and (2) routine keyword removal.

To count keyword frequency, for each key-

word that appears in the geotagged tweets, the fre-

quency of the keyword is counted per day and

the total is stored in FCD

(key)

. Let FCD

(key)

( fcd

(key)

, fcd

(key)

,··· , fcd

(key)

) be a sequence of the

daily frequencies of the keyword key. Moreover, for

each keyword that appears in geotagged tweets, the

frequency of keyword is counted par one hour. Let

FCH

(key)

= ( fch

(key)

, fch

(key)

,··· , fch

(key)

) be a se-

quence of the hourly frequencies of keyword key.

If fch

(key)

is larger than a user-given threshold

minf and fch

(key)

≥

∑

c−1

k=c−25

fch

(key)

/24, the key-

word is a locally frequent keyword. Thus,

LFK

(c)

= {l fk

(c)

∃gt

(c)

∈ SGT

(c)

includes l fk

(c)

fch

(l fk

(c)

)

≥ min f and

fch

(l fk

(c)

)

≥

c−1

∑

k=c−25

fch

(l fk

(c)

)

/24} (1)

In the second step, for each locally frequent key-

word in LFK

(c)

, locally frequent keyword l fk ∈

LFK

(c)

is removed if it is a routine keyword. Let

the lower quartile, second quartile, and third quar-

tile of FCD

(l fk)

be Q

(l fk)

, Q

(l fk)

, and Q

(l fk)

, respec-

tively. Consider as an example sequence FCD

(l fk

)

{8,12, 17,10, 11,13, 14}. The second quartile Q

(l fk

)

is the median of FCD

(l fk

)

; therefore Q

(l fk

)

= 12.

Moreover, Q

(l fk

)

= 10 and Q

(l fk

)

= 14.

Real-time Local Topic Extraction using Density-based Adaptive Spatiotemporal Clustering for Enhancing Local Situation Awareness

205

=14

=10

=12

=16

IQR=4

IQR=8

FCD

(lfk

)

FCD

(lfk

)

Figure 3: Detecting local bursty keywords.

The range |Q

(l fk)

− Q

(l fk)

| = IQR

(l fk)

is called

the Interquartile Range (IQR). The distribution of

FCD

(l fk)

is used to detect bursty keywords. If dis-

tribution of FCD

(l fk)

is small, this means that l fk

appears constantly in geotagged tweets, but not con-

stantly.

Deﬁnition 1 (Local Bursty Keyword). If a locally fre-

quent keyword l fk satisﬁes at least of the following

conditions, l fk is a bursty local keyword.

(1) ∃ fcd

(l fk)

∈ FCD

(l fk)

, fcd

(l fk)

− Q

(l fk)

IQR

(l fk)

× 1.5.

(2) ∃ fcd

(l fk)

∈ FCD

(l fk)

, Q

(l fk)

− fcd

(l fk)

IQR

(l fk)

× 1.5

Figure 3 shows an example of local bursty

keyword detection. Consider two example se-

quences FCD

(l fk

)

= {8,12, 17,10,11,13,14}, and

FCD

(l fk

)

= {3, 12,29, 8,11,13, 16}. For the ﬁrst se-

quence, as IQR

(l fk

)

= 4, there is no data that sat-

isﬁes deﬁnition 1; therefore l fk

is a routine key-

word. For the second sequence, IQR

(l fk

)

= 8 and

29− 16 > IQR

(l fk

)

× 1.5; therefore l fk

is not a rou-

tine keyword.

5 ADAPTIVE SPATIOTEMPORAL

CLUSTERING BASED ON

(ε,τ)-DENSITY

Local bursty keywords help us to be aware of what is

happing near users; however, it is difﬁcult to deter-

mine in which places the keywords are occurring.

The Spatial Clustering Manager identiﬁes bursty lo-

cal areas as spatial clusters using (ε,τ)-density-based

adaptive spatiotemporal clustering.

Adaptivespatiotemporal clustering based on (ε,τ)-

density is a natural extension of the density-based

criteria proposed by Ester et al. (Ester et al.,

1996),(Sander et al., 1998). In this section, (ε,τ)-

density-based adaptive spatiotemporal clustering is

explained brieﬂy and we show how to use it to extract

bursty areas of local bursty keywords.

In density-based spatial clustering, for each data

point within a spatial cluster, the neighborhood of a

user-deﬁned radius must contain at least a minimum

number of points; thus, the density in the neighbor-

hood must exceed some predeﬁned threshold. This

density criterion allows us to recognize areas in which

densities are higher than in other areas. However,

it does not consider temporal changes correctly. It

is important to analyze temporal changes to extract

local topics and events to enhance situation aware-

ness in real time. In contrast, the (ε, τ)-density-based

adaptive spatiotemporal clusters cover spatiotemporal

clusters that are both temporally and spatially sepa-

rated from other spatiotemporal clusters.

5.1 Deﬁnitions

There are several density-based adaptive spatiotem-

poral criteria that can be used to deﬁne (ε,τ)-density-

based adaptive spatiotemporal clusters.

Suppose that a keyword and the set of

geotagged tweets including that keyword are

key and SKGT

(key)

∈ SGT, respectively. Let

kgt

(key)

= gt

(key)

( j)

be a keyword-related geo-

tagged tweet that has keyword key included in its

text

(key)

( j)

. A sequence of keyword-related geotagged

tweets is SKGT

(key)

= (kgt

(key)

,··· ,kgt

(key)

}, where

SKGT

(key)

∈ SGT. Further, function φ

(key)

( j) is an

injective function:

(key)

( j) :

SKGT

(key)

→ SGT; kgt

(key)

7→ gt

(key)

( j)

(2)

Deﬁnition 2 ((ε,τ)-density-based Neighborhood).

The (ε,τ)-density-based neighborhood of a rele-

vant geotagged tweet kgt

(key)

, which is denoted by

KSTN

(ε,τ)

(kgt

(key)

), is deﬁned as

KSTN

(ε,τ)

(kgt

(key)

) = {kgt

(key)

∈ SKGT

(key)

dist(kgt

(key)

,kgt

(key)

) ≤ ε and

iat(kgt

(key)

,kgt

(key)

) ≤ τ}, (3)

KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval

206

where the function dist() returns the distance between

kgt

(key)

and kgt

(key)

, and the function iat() returns the

interarrival time between them.

The local spatiotemporal density of a keyword-

related geotagged tweet kgt

(key)

is denoted as

lstd(kgt

(key)

). The spatiotemporal space is divided

into several spatiotemporal grids in three-dimensional

space. The number of spatiotemporal grids is div

lng

div

lat

× div

time

, where lng, lat and time are longi-

tude, latitude and posted time, respectively. For each

spatiotemporal grid, the number of geotagged tweets

posted in the past is calculated. The degree of lo-

cal spatiotemporal density of a geotagged tweet is the

normalized value of the number of geotagged tweets:

lstd(kgt

(key)

) =

stnum(geo

gid(kgt

(key)

)) − stnum

min

stnum

max

− stnum

min

, (4)

where stnum(i) returns the number of geotagged

tweets in the i-th grid. Function geo

gid(kgt

(key)

) re-

turns the grid ID where kgt

(key)

is located. Further-

more, stnum

min

and stnum

max

are the minimum and

maximum values, respectively.

Deﬁnition 3 (Adaptive Threshold). The minimum

number of keyword-related geotagged tweets is called

the adaptive threshold ATH, deﬁned as follows.

ATH(kgt

(key)

,MinKGT) =

(MinKGT − 1) × lstd(kgt

(key)

) + 1, (5)

where the function lstd() returns the degree of local

spatiotemporal density (0 ≤ lstd(kgt

(key)

) ≤ 1.0).

A keyword-related geotagged tweet kgt

(key)

is called a core-keyword-related geotagged tweet

if there is at least ATH(kgt

(key)

,MinKGT) in

KSTN

(ε,τ)

(kgt

(key)

) such that (|KSTN

(ε,τ)

(kgt

(key)

≥ ATH(kgt

(key)

,MinKGT)). Otherwise, kgt

(key)

called a border-keyword-related geotagged tweet.

Deﬁnition 4 ((ε,τ)-density-based Directly Adaptive

Reachable). Suppose that a keyword-related geo-

tagged tweet kgt

(key)

is in the (ε,τ)-density-based

neighborhood of kgt

(key)

. If |KSTN

(ε,τ)

(kgt

(key)

)| ≥

ATH(kgt

(key)

,MinKGT), kgt

(key)

is (ε,τ)-density-

based directly adaptive reachable from kgt

(key)

Deﬁnition 5 ((ε,τ)-density-based Adaptive

Reachable). Suppose that there is a se-

quence of keyword-related geotagged tweets

(kgt

(key)

,kgt

(key)

(p+1)

,··· ,kgt

(key)

(p+l)

) and the (p + i)-th

keyword-related geotagged tweet kgt

(key)

(p+i+1)

is (ε,τ)-

density-based directly adaptive reachable from the

(p + i)-th keyword-related geotagged tweet kgt

(key)

(p+i)

The keyword-related geotagged tweet kgt

(key)

(p+l)

(ε,τ)-density-based adaptive reachable from kgt

(key)

Deﬁnition 6 ((ε,τ)-density-based Adaptive Con-

nected). Suppose that the keyword-related geotagged

tweets kgt

(key)

and kgt

(key)

are (ε,τ)-density-based

adaptive reachable from an arbitrary keyword-related

geotagged tweet kgt

(key)

. If |KSTN

(ε,τ)

(kgt

(key)

)| ≥

ATH(kgt

(key)

,MinKGT), kgt

(key)

is (ε,τ)-density-

based adaptive connected to kgt

(key)

5.2 Adaptive Spatiotemporal Clusters

based on (ε,τ)-Density

An (ε,τ)-density-based adaptive spatiotemporal clus-

ter is deﬁned as follows:

Deﬁnition 7 ((ε,τ)-density-based adaptive spatiotem-

poral cluster). An (ε,τ)-density-based adaptive spa-

tiotemporal cluster for an keyword key (ASTC

(key)

) in

SKGT

(key)

satisﬁes the following restrictions:

i1j ∀kgt

(key)

, kgt

(key)

∈ SKGT

(key)

, if and only if

kgt

(key)

∈ ASTC

(key)

, kgt

(key)

is (ε,τ)-density-

based adaptive reachable from kgt

(key)

, and

kgt

(key)

is also in ASTC

(key)

i2j ∀kgt

(key)

, kgt

(key)

∈ ASTC

(key)

, kgt

(key)

is (ε,τ)-

density-based adaptive connected to kgt

(key)

5.3 Algorithm

Algorithm 1 describes the algorithm for (ε,τ)-

density-based adaptive spatiotemporal clustering for

extracting bursty areas of local bursty keyword key.

In this algorithm, for each geotagged tweet gtp in

SKGT

(key)

, the function IsClustered checks whether

gtp is already assigned to a spatiotemporal cluster.

The (ε,τ)-density-based neighborhood of gtp is then

obtained using the function GetNeighborhood. If

gtp is a core keyword-related geotagged tweet, it is

assigned to a new spatiotemporal cluster, and all the

neighbors are queued to Q for further processing.

The processing and assignment of keyword-related

geotagged tweets to the current spatiotemporal clus-

ter continues until the queue is empty. First, the

next keyword-related geotagged tweet is dequeued

from queue Q. If the dequeued keyword-related geo-

tagged tweet is not already assigned to the current

spatiotemporal cluster, it is assigned to the current

Real-time Local Topic Extraction using Density-based Adaptive Spatiotemporal Clustering for Enhancing Local Situation Awareness

207

input : SKGT

(key)

- a set of geotagged tweet including keyword key, ε - neighborhood radius, τ -

interarrival time, MinKGT is threshold value

output: STC

(key)

- set of spatiotemporal clusters

cid ← 1;

STC

(key)

← φ;

for i ← 1 to |SKGT

(key)

| do

gtp ← gt

(key)

∈ SKGT

(key)

;

if IsClustered

(

gtp

)

== false then

N ← GetNeighborhood

(

gtp,ε,τ

)

;

if |N| ≥ ATH(gtp, MinKGT) then

stc

(key)

cid

←MakeNewCluster

(

cid,gtp

)

;

cid ← cid + 1;

EnQueue

(

Q,N

)

;

while Q is not empty do

gtq ← DeQueue

(

)

;

stc

(key)

cid

← stc

(key)

cid

∪ gtq;

N ← GetNeighborhood

(

gtq,ε,τ

)

;

if |N| ≥ ATH(gtp,MinKGT) then

EnNniqueQueue

(

Q,N

)

;

end

STC

(key)

← STC

(key)

∪ stc

(key)

cid

;

end

return STC

(key)

;

Algorithm 1: (ε, τ)-density-based adaptive spatiotemporal clustering algorithm.

spatiotemporal cluster. If the dequeued keyword-

related geotagged tweet gtq is a core keyword-

related geotagged tweet, the keyword-related geo-

tagged tweets in the (ε,τ)-density-based neighbor-

hood of gtq are then queued in queue Q using the

function EnNniqueQueue, which places the input

keyword-related geotagged tweets into queue Q if

they are not already in it Q.

6 EXPERIMENTS

We implemented our proposed system and evaluated

it.

6.1 Experiment Setup

We evaluated our proposed system by a case study.

The parameters in the experiments were set as fol-

lows: ε is 5km, τ is 3600sec, and MinKGT is 5.

Moreover, the user location was set to (34.578618,

132.796105). In the experiments, we used geotagged

tweets that were located within 70 km of the user. The

user-given threshold minf is 5.

In the (ε,τ)-density-based adaptive spatiotempo-

ral cluster, local spatiotemporal densities are required.

To calculate local spatiotemporal densities, we used

3,301,605geotagged tweets from December 13 to De-

cember 23, 2013 and counted in each spatiotemporal

grids. We considered the spatiotemporal space for lo-

cal spatiotemporal densities, which is a rectangle con-

sisted of the westernmost point (24.4494, 122.93361)

and the northernmost point (45.5572, 148.752) of

Japan. This rectangle was equally divided into several

spatiotemporal grids of div

lng

= 1,000, div

lat

= 1,000

and div

time

= 24.

6.2 Case Studies

In this experiment, we conﬁrmed our new system

could identify two local topics related to natural dis-

asters. The ﬁrst local topic is heavy snowfall on De-

cember 17, 2014. A explosive cyclogenesis called

a “bomb” cyclone hit the region of Japan. This ex-

plosive cyclogenesis brought heavy snowfall in Hi-

KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval

208

Wow! It’s snowing in downtown

Figure 4: Case-1.

Macdonald in Yume Town can

not serve hamburgers because of the power frailer.

They game a refund.

Figure 5: Case-2.

roshima, which is user’s location. Figure 4 shows a

screen shot of the Web-based interfaces on the An-

droid application. The locally bursty keyword include

“snow,” “snow man” and “trafﬁc jam.” These key-

words are related to the snowfall. In the Android ap-

plication, attached photos can be opened. Figure 4

also shows a attached photo. If we look at this photo,

we can be aware of the snowfall in downtown.

The second local topic is a power failure. Figure

5 shows a screen shot of the Web-based interfaces on

the Android application. The power failure occurred

in Hiroshima due to lightning strikes around 13:15

p.m. on February 12, 2015 (JST). Our system could

detect this local topic in real time. The locally bursty

keywords include “lightning” and “power failure.”

7 CONCLUSIONS

In this paper, we proposed a new real-time spatiotem-

poral analysis system for enhancing local situation

awareness using density-based adaptive spatiotempo-

ral clustering. In the proposed system, local bursty

keywordsare extracted and their bursty areas areiden-

tiﬁed. In our new system, locally frequent keywords

in geotagged tweets within a particular distance from

a user are ﬁrst extracted. To determine whether lo-

cally frequent keywords are bursty keywords or rou-

tine keywords, we utilize quartile-based outlier detec-

tion. Moreover, bursty local areas related to extracted

local bursty keywords are identiﬁed using density-

based adaptive spatiotemporal clustering. We evalu-

ated the proposed system using actual real-world top-

ics related to weather in Japan. Experimental results

Real-time Local Topic Extraction using Density-based Adaptive Spatiotemporal Clustering for Enhancing Local Situation Awareness

209

showed that the proposed system could extract local

topics and events. In our future work, we are go-

ing to evaluate using variety types of local topics and

events. Moreover, some local bursty keywords are re-

lated each other; however, the proposed system shows

these keywords individually. We are developing sum-

marizing method for local bursty keywords.

ACKNOWLEDGEMENTS

This work was supported by JSPS KAKENHI Grant

Number 26330139 and Hiroshima City University

Grant for Special Academic Research (General Stud-

ies).

REFERENCES

Avvenuti, M., Cresci, S., Marchetti, A., Meletti, C., and

Tesconi, M. (2014). Ears (earthquake alert and re-

port system): A real time decision support system

for earthquake crisis management. In Proceedings of

the 20th ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining, pages 1749–

1758.

Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996).

A density-based algorithm for discovering clusters in

large spatial databases with noise. In Second Interna-

tional Conference on Knowledge Discovery and Data

Mining, pages 226–231.

Hui, C., Tyshchuk, Y., Wallace, W. A., Magdon-Ismail, M.,

and Goldberg, M. (2012). Information cascades in so-

cial media in response to acrisis: A preliminary model

and a case study. In Proceedings of the 21st Interna-

tional Conference Companion on WWW, pages 653–

656.

Hwang, M.-H., Wang, S., Cao, G., Padmanabhan, A., and

Zhang, Z. (2013). Spatiotemporal transformation of

social media geostreams: A case study of twitter for

ﬂu risk analysis. In Proceedings of the 4th ACM

SIGSPATIAL IWGS, pages 12–21.

Hyndman, R. J. and Fan, Y. (1996). Sample Quantiles

in Statistical Packages. The American Statistician,

50(4):361–365.

Kim, K.-S., Lee, R., and Zettsu, K. (2011). mtrend: discov-

ery of topic movements on geo-microblogging mes-

sages. In Proceedings of the 19th ACM SIGSPATIAL

International Conference on Advances in GIS, pages

529–532.

Kreiner, K., Immonen, A., and Suominen, H. (2013). Crisis

management knowledge from social media. In Pro-

ceedings of the 18th ADCS, pages 105–108.

Kumar, A., Jiang, M., and Fang, Y. (2014). Where not to

go?: Detecting road hazards using twitter. In Proceed-

ings of the 37th International ACM SIGIR Conference

on Research & Development in Information Retrieval,

pages 1223–1226.

Mendoza, M., Poblete, B., and Castillo, C. (2010). Twitter

under crisis: Can we trust what we rt? In Proceedings

of the First Workshop on SOMA, pages 71–79.

Naaman, M. (2011). Geographic information from geo-

referenced social media data. SIGSPATIAL Special,

3(2):54–61.

Sakaki, T., Okazaki, M., and Matsuo, Y. (2010). Earthquake

shakes twitter users: Real-time event detection by so-

cial sensors. In Proceedings of the 19th International

Conference on WWW, pages 851–860.

Sander, J., Ester, M., Kriegel, H.-P., and Xu, X. (1998).

Density-based clustering in spatial databases: The al-

gorithm gdbscan and its applications. Data Mining

and Knowledge Discovery, 2(2):169–194.

Thom, D., Bosch, H., Koch, S., Worner, M., and Ertl, T.

(2012). Spatiotemporal anomaly detection through

visual analysis of geolocated twitter messages. In

Paciﬁc Visualization Symposium (PaciﬁcVis), 2012

IEEE, pages 41–48.

Vieweg, S., Hughes, A. L., Starbird, K., and Palen, L.

(2010). Microblogging during two natural hazards

events: What twitter may contribute to situational

awareness. In Proceedings of the SIGCHI Confer-

ence on Human Factors in Computing Systems, pages

1079–1088.

Yin, J., Lampert, A., Cameron, M., Robinson, B., and

Power, R. (2012). Using social media to enhance

emergency situation awareness. IEEE Intelligent Sys-

tems, 27(6):52–59.

KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval

210