Towards Sentiment-driven Maps Showing Touristic Attractiveness

Sarah Tauscher and Karl Neumann

Institute for Information Systems, Technische Universität Braunschweig, Braunschweig, Germany

tauscher@ifis.cs.tu-bs.de

Keywords: Automatic Cartography, Semiology of Graphics, Sentiment Analysis, Volunteered Geographic Information.

Abstract: User generated texts on tourism-related social network sites do not only contain factual information, but

also valuable opinions and ratings of locations. Nevertheless, most maps on these sites only show markers

where something described in a user generated text is located. In particular, no further information is

derived from the text and displayed on the maps. Moreover, generalization operations are not employed,

although in most cases aggregation and displacement of the user generated content would be necessary to

achieve more readable maps. Therefore, we propose a method which automatically creates user-sentiment

enriched maps. We use natural language processing tools in order to mine user sentiments for specific

places from user generated texts and we propose specially designed point symbols which represent the

corresponding mined user sentiment for each location. Additionally, we propose a heuristic, based on

Voronoi diagrams, which slightly displaces the aforementioned symbols in case they are very close. This

makes the provided map easier to read.

1 INTRODUCTION

A lot of user generated information accumulated in

the web is related to a place. Because everything

people do, they are doing it somewhere and most of

the time it makes a difference where this is. This

spatial reference is especially important on so-called

travel social network sites. These sites do not only

gather information about places and communicate

information about where these places are actually

located, they also offer travellers the opportunity to

connect and share information. Especially these

opinions and experiences of other users provide the

additional benefit of those sites in comparison to

traditional travel guides. Surprisingly, these sites are

in their appearance very similar to other social

network sites without an explicit spatial reference.

Though maps are a very useful representation to

describe the environment, there are rarely more

maps on a travel social site than on a pure social site.

Moreover, these maps often only show markers

where something described in a text is, like it is

shown in Figure 1. The rather simple information

content of these maps is owed to the services, which

are used for their creation. These services, like

GoogleMaps, Bing or Yahoo Maps, enable their

users to create layers containing any content that can

then be viewed over the respective imagery base.

But there is no aid in choosing appropriate markers

for a special content. Furthermore, generalization

Figure 1: Popular locations in Tennessee, map extracted from virtualtourist.com.

129

Tauscher S. and Neumann K..

Towards Sentiment-driven Maps Showing Touristic Attractiveness.

DOI: 10.5220/0005454401290134

In Proceedings of the 1st International Conference on Geographical Information Systems Theory, Applications and Management (GISTAM-2015), pages

129-134

ISBN: 978-989-758-099-4

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

operations are not supported by these systems,

although in most cases aggregation and

displacement of the user generated content would be

necessary to achieve more readable maps.

In this paper, we try to demonstrate that the

already existing cartographic knowledge could be

used to automatically create maps showing the

sentiments towards places, which are more

appealing and more expressive than the usual maps

with markers. For this purpose we use well known

natural language processing and opinion mining

tools and generate maps of reviews for towns. These

maps consist of a simple base map and specially

designed point symbols, which represent for each

location the corresponding sentiment values by their

size and colour. If locations are too close to each

other the map symbols will be minimized and

slightly displaced. Thus, easily readable maps are

produced, which enable the user to capture at a

glance where attractive touristic locations are and

how many reviews have contributed to their ratings.

The rest of the paper is organized as follows: In

Section 2 the sentiment analysis method we utilized

is described and evaluated. Section 3 addresses the

design process for the map symbols representing

sentiments. A method to displace point symbols is

developed in Section 4. Afterwards, the map

symbols and the displacement are applied to real

world data and the results are presented in Section 5,

before Section 6 concludes the paper.

2 SENTIMENT ANALYSIS

We considered three methods for sentiment analysis,

namely SentiStrength (Thelwall et al., 2010),

Lexicon-Based Classifier (Paltoglou and Thelwall,

2012) and SO-Cal (Taboada et al., 2011), which

have been developed for informal web content. As

the latter performs best in preliminary tests, we will

only present and discuss its results.

We extracted 36,715 reviews about locations in

the USA from a travel social network site,

preprocessed them with the Brill Tagger (Brill,

1992) and classified them using SO-CAL. The

majority of them, 24,367 were classified as positive,

8,659 as negative and 2,017 as neutral. In addition,

500 randomly selected reviews have been manually

classified, in order to evaluate this analysis. The

classification task was to assign to each review

either a positive, a negative or a neutral value,

depending on the sentiments expressed with respect

to the location. Table 1 lists the resulting values for

precision, recall and the f-measure of these 500

reviews.

Table 1: Evaluation results for 500 randomly selected

reviews, considering only location specific sentiments.

# Precision Recall F-measure

Positive 304 0.86 0.90 0.88

Neutral 132 0.87 0.39 0.54

Negative 64 0.43 0.81 0.56

The result for positive reviews is satisfying,

whereas neutral reviews have a rather low recall and

negative reviews a low precision, resulting in a

disappointing f-measure for both classes. One reason

for this shortcoming of the method is that in a lot of

reviews not only a location is described and rated,

but also its historic background. Often the history is

connected to a war or a natural disaster,

consequently the text contains a lot of negative

expressions, which are misjudged as a negative

sentiment towards the corresponding location.

Additionally, neutral reviews, which rather express

facts then sentiments about a location, are seldom

written completely in a factual diction. Instead, they

quite often contain negative as well as positive

judgments on the facts. If the manual classification

task is modified, i.e., if the reviews should be

classified by considering all sentiments expressed in

the text, the results are significantly improved, as

Table 2 shows. Still the recall for neutral and the

precision for negative reviews are not as good as for

positive ones, but they are in accordance to the

results reported in (Taboada et al., 2011).

Table 2: Evaluation results for 500 randomly selected

reviews, considering all sentiments.

# Precision Recall F-measure

Positive 315 0.92 0.92 0.92

Neutral 77 0.90 0.70 0.79

Negative 108 0.81 0.87 0.84

Hence, the method seems to be appropriate for

our domain. Nevertheless, a preprocessing step,

which filters background information out of reviews

would be necessary, in order to get only the location

specific sentiments.

3 MAP SYMBOLS FOR

SENTIMENTS

According to the intended communication goal, the

GISTAM2015-1stInternationalConferenceonGeographicalInformationSystemsTheory,ApplicationsandManagement

130

map symbols should fulfil the following

requirements:

R1 Point signature representing two independent

attributes

R2 Support quantitative perception between both

attributes

R3 Support selective perception regarding one

attribute

R4 Support ordered perception of the number of

reviews that have been aggregated

First, we have to identify the components of the

information we want to visualize. “Sentiment” can

be considered either to be one component having

values ranging from negative over indifferent to

positive (from -1 to 1) or as two components: one

nominal component having two values (positive and

negative) and one quantitative component describing

the strength of the sentiment. The latter approach is

more suitable for our end as we want to distinguish

if there are positive and negative sentiments towards

a place or if there are only indifferent ones. As the

number of reviews should also be represented, a

third ordered component is added to our information

“Sentiment”.

Consequently, we have to use three visual

variables to visualize these three components. For

the nominal component an unordered variable, i.e.,

form, color or orientation is appropriate (Bertin,

2011). As our goal is to represent two independent

nominal attributes at the same place, it is necessary

to use different forms that can be distinguished even

if one is superimposed by the other. In order to fulfil

requirement R3, also different colours are used as

the form is not selective. As size is the only

quantitative variable, it will be used for the

sentiment value (R2). Finally, R4 leaves two choices

for the number of reviews: texture and brightness.

Due to the possibility of vibratory effects of textures

we pick the latter. The change of the brightness of

one object has the same visual effect as the change

of its transparency, as long as the background is

white. So we chose transparency, because it provides

an additional benefit: the outlines of both forms are

visible even if one covers the other completely.

Concerning the choice of the form, there are no

formal guidelines, except not to use one that is

already strongly associated with different

information. So we chose as symbol for positive

sentiments a turquoise six-pointed star and for

negative ones a red circle. Green and red would be a

more intuitive choice for colours to express positive

and negative values (traffic light metaphor), though

it is unfortunately not accessible for colour blind

people. Stars are often used for ratings and have a

positive connotation. On the one hand five-pointed

stars are much more common and therefore suggest

themselves, but on the other they might be easily

misinterpreted. The sizes of the circle and the star

are independent of each other as well as their opacity

values. Figure 2 shows the map symbols for the

positive as well as the negative values 0.1, 0.5 and 1

and all their possible combinations. The opacity for

all symbols is 0.8, which is also defined as

maximum value in order that both signatures are

always cognizable. The minimum opacity value is

set to 0.2, and in addition a maximum number of

reviews is defined. The reason for this is that it is

sufficient for the user to see, if the number of

reviews exceeds a certain value, which indicates that

the sentiment value is reliable.

Figure 2: Map symbols for different combinations of

sentiment values.

4 PLACEMENT OF MAP

SYMBOLS

Ideally, map symbols should be placed at the exact

coordinates of the object they symbolize.

Furthermore, the map symbols should exceed the

minimal graphical size (Keates, 1993) as well as

map symbols should not overlap, in order to keep

the map readable. Especially in small scale maps the

map symbol often covers more space than the

corresponding object, though increasing the

probability for overlapping map symbols.

Consequently, at least one of the following basic

generalization operations, selection, aggregation and

displacement should be used to resolve the overlap

of map symbols. Selection would induce a complete

loss of information, in our case the sentiment values

of some locations, whereas displacement decreases

the geographic accuracy and aggregation the

“geographic” resolution. Consequently,

TowardsSentiment-drivenMapsShowingTouristicAttractiveness

131

displacement is the most appropriate choice for our

application area.

Hence, we propose an iterative displacement

method: The input parameters are the coordinates of

the locations as well as the size of their map symbols

and the output is a list of the displaced points.

Furthermore, a threshold for the acceptable distance

has to be defined. We use a Voronoi diagram of the

input points as auxiliary structure, which is

recalculated at the beginning of each iteration as the

points are moved during the iteration steps.

For each iteration, the following conditions are

checked one after the other and the corresponding

instruction is executed. If the point symbol fits

completely in the Voronoi cell of the corresponding

point and the point is not marked as conflicting, the

point is kept. If the point symbol actually overlaps

any of its neighbours, it is checked if it is possible to

place the signature anywhere within the Voronoi cell

of its corresponding point. And if the distance

between the original point and the new centre of the

point symbol is less than the given threshold, the

point is replaced. In this way the propagation of

conflict, which arises when the displacement of one

point symbol raises a conflict with a symbol that was

not previously in conflict, is restricted, as Figure 3

illustrates.

Figure 3: Displacement of single map symbol.

Otherwise the neighbours are marked as

conflicting, they are checked again and the point is

moved as far as acceptable towards the mass centre

of the Voronoi cell. The reason for this is that the

iterative calculation of centroidal Voronoi diagrams

leads to a distribution of its points, where the

“energy” between them complies with the global

optimum (Du et al., 1999). Thus, we take the

direction towards the mass centre as a hint for a

promising displacement direction. In Figure 4 a

rather dense set of randomly created points, which

should be symbolized by grey discs is shown on the

left side, and the result of the displacement method

on the right.

The original points are drawn in black, the

displaced points in blue. The threshold is set to the

radius of the point symbols. The method stops if

there are no more overlapping map symbols or if no

point can be moved without exceeding the given

threshold. Additionally, the number of iterations can

be restricted, as the method delivers an intermediate

solution after a few steps, which is at least

considerably better than the initial placement, as

empirical studies presented in the following Section

indicate.

Figure 4: Point set before and after displacement.

5 APPLICATION TO REAL

WOLRLD DATA

Our test set consists of sentiment values extracted

from reviews for locations in the USA, as described

in Section 2, which have been complemented by

their coordinates taken from GeoNames.

Additionally we extracted the borders of the single

states from OpenStreetMap. For each state we

created a map using the map symbols described in

Section 3 and Esri’s world ocean base (MapServer)

as a base map. The scale of the maps varies between

1:1,000,000 and 1: 5,000,000 in such a way as to

enable us to present them true to scale within this

paper. For 17 states the locations where wide spread,

thus there were no, or less than ten easily solvable

conflicts. Therefore, we only analysed the placement

of map symbols for the remaining 3 states, whose

results are listed in Table 3.

For each state the number of solved and unsolved

conflicts as well as the number of displaced objects

and the average distance by which they are

displaced, are listed. The maximum number of

iterations was set to 200, but for all except three

states (IN, MA, OR) the method terminated earlier.

The threshold for the maximal acceptable

displacement has been set to 5mm, nevertheless the

average distance is about 2.6mm ± 1.3mm for the

single states. As expected, not only the conflicting

objects are displaced, but still the displacement is

restricted to objects close to conflicting ones, as

some objects always remain on their original

location.

GISTAM2015-1stInternationalConferenceonGeographicalInformationSystemsTheory,ApplicationsandManagement

132

Table 3: Results of the displacement of point symbols.

State (Abbr.) Area (km²) # Sites

# Overlaps Displacement

Solved Not solved # Objects Ø Distance (mm)

Alaska (AK) 1,723,337 67 45 6 41 2.98

Alabama (AL) 135,767 72 23 1 35 2.31

Arizona (AZ) 295,233 88 15 1 33 1.85

California (CA) 423,968 100 116 26 79 3.02

Colorado (CO) 269,602 71 14 2 30 1.74

Connecticut (CT) 14,356 58 24 3 32 2.79

Florida (FL) 170,312 99 77 17 84 2.93

Georgia (GA) 153,910 78 15 7 26 1.68

Illinois (IL) 149,997 66 33 12 41 2.29

Indiana (IN) 94,327 76 38 5 43 2.44

Kansas (KS) 213,099 72 14 1 29 1.85

Kentucky (KY) 104,656 81 21 1 35 2.11

Louisiana (LA) 135,658 42 18 2 25 2.44

Massachusetts (MA) 27,335 72 134 44 57 3.96

Maryland (MD) 32,131 61 41 15 51 2.59

Maine (ME) 91,634 64 53 16 40 3.38

Michigan (MI) 250,488 87 18 5 37 1.98

Montana (MT) 380,832 53 9 3 23 2.04

North Carolina (NC) 139,391 95 71 8 69 2.68

Nebraska (NE) 200,330 52 10 1 20 1.78

New Hampshire (NH) 24,214 45 15 2 25 2.27

New Jersey (NJ) 22,592 90 44 6 57 2.70

Nevada (NV) 286,380 33 10 2 13 1.62

New York (NY) 141,297 89 77 17 63 3.07

Ohio (OH) 116,099 80 52 4 52 2.81

Oklahoma (OK) 181,038 74 10 2 22 1.80

Oregon (OR) 254,800 94 81 6 67 2.99

Pennsylvania (PA) 119,279 98 43 1 49 2.49

Tennessee (TN) 109,152 46 15 1 29 1.97

Texas (TX) 695,660 100 64 8 66 2.47

Virginia (VA) 110,787 89 82 14 60 2.90

Washington (WA) 184,661 94 74 20 63 2.76

West Virginia (WV) 62,755 59 10 1 24 2.00

The results also support the assumption that the

resolvability of overlaps depends mainly on the

distribution of points, as there is neither an

interrelation between the number of conflicts and the

number of objects, nor between the number of

conflicts and the percentage of conflicts solved.

In Figure 5 the resulting map for Tennessee is

shown, where all but one conflict at the southeast

border could be resolved. The remaining overlap is

due to the size and the closeness of the involved

signatures not solvable, if the threshold of 5mm is

kept.

TowardsSentiment-drivenMapsShowingTouristicAttractiveness

133

Figure 5: Sentiment map of Tennessee.

6 CONCLUSIONS

In this paper, we sketched out a method for

generating maps for tourism-related social network

sites that are more expressive than the usual pin

maps shown on these websites. This was done via

specific map symbols consisting of two separate

parts, and by natural language processing including

sentiment analysis. Furthermore, we proposed a

heuristic method for point symbol displacement,

which utilizes Voronoi diagrams.

It is obvious to use this kind of maps by

embedding them in travel social network sites. In

this case it would be adequate to show the names of

the locations as tool tip texts. Moreover, a synthetic

excerpt of the reviews as pop up after clicking on a

location would be eligible. But though it is a well-

treated issue (Pang and Lee, 2008), there is no ready

to use solution available. Additionally, preferences

of users, e.g., if they are more interested in

adventure, family or wellness holidays, could be

considered by analysing first the different aspects

mentioned in a review and then the corresponding

sentiments. The design of the presented sentiment

maps has been optimized for small scale maps. As it

is generally possible and desired to zoom into digital

maps, the adaptability of this kind of maps to

different scales is another aspect of future work.

REFERENCES

Bertin, J., 2011. Semiology of graphics: diagrams

networks maps, Esri Press. Redlands.

Brill, E., 1992. A simple rule-based part-of-speech tagger.

In ANLP’92, 3rd Conference on Applied Natural

Language Processing. ACL.

Du, Q., Faber, V., Gunzburger, M., 1999. Centroidal

Voronoi tessellations: applications and algorithms.

SIAM review, vol. 41, no. 4, pp. 637-676.

Keates, J. S., 1993. Cartographic design and production,

Wiley. New York.

Paltoglou, G., Thelwall, M., 2012. Twitter, MySpace,

Digg: Unsupervised sentiment analysis in social

media. ACM Trans. Intell. Syst. Technol, vol. 3, no. 4

September, pp.66:1-66:19.

Pang, B. W., Lee, L., 2008. Opinion mining and sentiment

analysis. Foundation and Trends in Information

Retrieval, vol. 2, no.1, pp. 1-135.

Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede,

M., 2011. Lexicon-based methods for sentiment

analysis. Comput.Linguist., vol. 37, no. 2, June,

pp.267-307.

Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas,

A., 2010. Sentiment in short strength detection

informal text. J.Am. Soc. Inf. Sci. Technol., vol. 61, no.

12, December, pp.2544-2558.

GISTAM2015-1stInternationalConferenceonGeographicalInformationSystemsTheory,ApplicationsandManagement

134