2. Follower count
The number of followers are gathered from
the official Twitter accounts of U.S.
presidential candidates namely
@realDonaldTrump & @HillaryClinton.
3. Mention count
The number of mentions are calculated
based on the appearance of the words
"@realDonaldTrump" & "@HillaryClinton"
on all tweet data.
4. Tweet count
The number of tweets posted are gathered
from the official Twitter account of the U.S.
presidential candidates in the specified time
frame.
5. Tweet grouping
To visualize the intensity of weekly tweet
posting for each candidate in the campaign
period, we group the tweets posted using the
timestamp.
6. Sentiment analysis
Sentiment analysis of the tweets is
performed for each candidate. Tweets for
each candidate will be grouped into two
groups namely positive and negative.
Sentiment analysis aims to see the reaction
of Twitter users to each candidate.
Determination of positive and negative
sentiments obtained from the words
contained in the tweet. We use the words
that indicate positive, for example ("good",
"great") and words that indicate negative, for
example ("fail", "don't", "poor") and
positive emoticons, for example
(":)", ";)", ":D", " :-)", ":-D ") and negative
(":(", ":-(", ":'(", ":'(") (Agarwal et al., 2011;
Sahayak et al., 2015). We use a library in
Node.js to analyze sentiment data of tweets.
7. Geographical grouping
The grouping of tweets by geographical
location i.e country is done using the
timezone data. Timezone data is used
because the location variable in the majority
of tweets are null.
8. Counting adjectives
The calculation is done by counting the most
frequent words that appear in the tweet data
that has been tokenized. Then filtered for
English adjectives.
3.4 Visualization
The final stage is to visualize the data into graphic or
chart that appropriate, to show the data in in the form
of visual cues. Bar chart is used to show comparison
between candidates' Twitter profiles. To visualize
weekly tweets for each candidate, we use a line chart,
which is good in showing trends. Donut chart is
chosen to show proportion between negative and
positive sentiment for each candidate, while the
choropleth map is used to show geographical location
for sentiment analysis. Finally, to show the most
frequent adjective to describe each candidate, we use
word clouds.
4 RESULT
Data collection was carried out from 11 August 2016
to 16 November 2016. The selection of this time
period is based on the campaign period that started 90
days before the election day, and 7 days after the
election to catch the responses after the election day.
We use the scraping method to get the data backward
from election day (11 August 2016 to 9 November
2016). Meanwhile the streaming method we use to get
data in real time starting from election day (9
November 2016) to 7 days later (16 November 2016).
We collected 3,796,293 tweets which occupy 14
gigabytes of storage. The data are then cleaned and
processed. to produce four types of visualization,
namely twitter profile, weekly tweet, sentiment
analysis, and word cloud. The aim of the
Visualization is to compare profiles, activities, and
perceptions or community responses in social media
of both American and non-American citizens to the
two candidates.
4.1 Twitter Profile
A Twitter profile visualization aims to compare the
quantity of followers, mentions, and tweets of each
candidate when the data is obtained. The number of
followers, mentions, and tweets is an initial
description of how the candidates' activities and
popularity are in cyberspace. The data gathering is
using methods that have been explained in the
methodology section. The data are presented in Table
1.
Table 1: Twitter Profile on 16 November 2016.
@realDonaldTrump @HillaryClintion
Followers 11.2 Million 15.8 Million
Mentions 38 Thousan