“It’s Modern Day Presidential! An Evaluation of the Effectiveness of

Sentiment Analysis Tools on President Donald Trump’s Tweets”

Ann Masterton Perry

1 a

, Terhi Nurmikko-Fuller

1 b

and Bernardo Pereira Nunes

2,3 c

Australian National University, Canberra, Australia

Pontiﬁcal Catholic University of Rio de Janeiro, Brazil

Federal University of the State of Rio de Janeiro, Brazil

Keywords:

Sentiment Analysis, Social Media, Donald Trump, POTUS, Twitter, Sentiment Analysis Tools.

Abstract:

This paper reports on an evaluation of ﬁve commonly used, lexicon-based sentiment analysis tools (Mean-

ingCloud, ParallelDots, Repustate, RSentiment for R, SentiStrength), tested for accuracy against a collection

of Trump’s tweets spanning from election day November 2016 to one year post inauguration (January 2018).

Repustate was found to be the most accurate at 67.53%. Our preliminary analysis suggests that this percentage

reﬂects Trump’s frequent inclusion of both positive and negative sentiments in a single tweet. Additionally to

providing an evaluative comparison of sentiment analysis tools, a summary of shared features of a number of

existing datasets containing Twitter content along with a comprehensive discussion is also provided.

1 INTRODUCTION

The President of the United States of America (PO-

TUS), Donald Trump, is an active and unique social

media user. As POTUS, his posts have real-world ef-

fects that go beyond those of other users, and his dis-

course is inﬂuencing politicians’ engagement, chang-

ing the acceptable language and expected behaviour

of elected ofﬁcials (Stolee and Canton, 2018).

But what is POTUS tweeting about? We hypothe-

sise that sentiment analysis tools (developed for large

datasets derived from a broad range of users) would

become inaccurate at the granularity of a single con-

tributor, where linguistic idiosyncrasies represent a

greater percentage of the data. To verify this hypoth-

esis, we evaluated the accuracy of ﬁve tools (Mean-

ingCloud, ParallelDots, Repustate, RSentiment for R,

SentiStrength) in classifying Trump’s tweets.

In this paper, we summarise on existing research

in the sentiment analysis of politicians’ use of Twit-

ter (Section 2); outline the ethical considerations of

including deleted tweets (Section 3); describe our

methodology (Section 4) and the dataset (Section 5);

and provide a comparative evaluation of ﬁve senti-

ment analysis tools (Section 6). Finally, we conclude

https://orcid.org/0000-0001-8283-2246

https://orcid.org/0000-0002-0688-3006

https://orcid.org/0000-0001-9764-9401

the paper with a discussion (Section 7).

2 BACKGROUND

(Stieglitz and Dang-Xuan, 2013) (Larsson and

Kalsnes, 2014) (Gainous and Wagner, 2014) (Hoff-

man et al., 2016) have examined elected ofﬁcials’

daily engagement with the public and changes to it

within an election context (Strandberg, 2012) (Gra-

ham et al., 2014). Trump’s use of social media has

been examined previously (Francia, 2017) (Gross and

Johnson, 2016) (Enli, 2017) (Oh and Kumar, 2017)

(Ahmadian et al., 2016) (Stolee and Canton, 2018)

and (Karpf, 2017) found that his use of Twitter has

changed the way political campaigns are conducted.

This includes altering previously accepted models of

political discourse in the United States (Ott, 2016)

(Auxier and Golbeck, 2017) and legitimizing the be-

haviour of other world leaders (McNair, 2018).

The ﬁrst instance of real-time sentiment analy-

sis for political events was during the 2012 Obama-

Romney presidential cycle, when (Wang et al., 2012)

analysed over 36 million tweets posted during the

campaign and election. Other studies into politicians’

Twitter use have been undertaken by (Park et al.,

2015) (Ahmed et al., 2016) and (Wang et al., 2016).

Tweets contain hashtags (e.g #TrumpsWall-

Songs), acronyms (“LOL”), emoticons ( :) ) and

644

Perry, A., Nurmikko-Fuller, T. and Nunes, B.

“It’s Modern Day Presidential! An Evaluation of the Effectiveness of Sentiment Analysis Tools on President Donald Trump’s Tweets”.

DOI: 10.5220/0007759306440651

In Proceedings of the 21st International Conference on Enterprise Information Systems (ICEIS 2019) , pages 644-651

ISBN: 978-989-758-372-8

emojis ( ), which are problematic for lexicon-based

analysis tools (Davidov et al., 2010) (Kouloumpis

et al., 2011), but (Pak and Paroubek, 2010) developed

accurate models using a na

ıve Bayes classiﬁer.

Sarcasm and irony also presents a challenge

(Reyes et al., 2012). Whilst models that consider

word context have been successful (achieving 65%

accuracy (Mukherjee and Bala, 2017)), (Wang et al.,

2012) argue there is much work to be done in develop-

ing models for their identiﬁcation and classiﬁcation.

Trump’s tweets do not typically follow English

syntax, containing incomplete conditional clauses, er-

ratic punctuation, and the use of capitals for emphasis,

demonstrating his difference from the Washington po-

litical elite. His willingness to immediately tweet his

thoughts on breaking news appeals to his supporters,

who are disillusioned with the current political land-

scape and, even if his statements later turn out to be

incorrect, this spontaneity is seen as a measure of hon-

esty and forthrightness (Stolee and Canton, 2018).

Lakoff

argues that a key to Trump’s success is

that he appears to be one step ahead of other politi-

cians and the media in communicating his thoughts on

current events, framing events in his own context and

perspective. He suggests that Trump’s tweets are so-

called “trial balloons”, deliberately designed to gauge

public response with no intention of commitment to

an underlying policy. Ongoing coverage by main-

stream media then legitimizes them, manufacturing

consensus online (Woolley and Guilbeault, 2017).

3 PRIVACY IN THE PUBLIC EYE

The debate over whether or not publicly available

social media data

2,3

should be accessible for aca-

demic research without requiring informed consent

from each individual user is on-going (Bonnilla and

Rosa, 2015) (Webb et al., 2017) (boyd and Crawford,

2012) (Nunan and Yenicioglu, 2013) (Fiesler and Pro-

feres, 2018). In this paper, we argue that the content

posted from the @realDonaldTrump account is in-

tended for public consumption, giving three reasons:

i) whether as POTUS, a high proﬁle businessman, or

a reality television star (all highly visible public po-

sitions which he actively sought), Trump can have

no expectation of obscurity, nor likely a desire for

it (Stolee and Canton, 2018); ii) it is possible both

https://georgelakoff.com/2017/03/07/trumps-twitter-

distraction/

https://twitter.com/en/privacy

https://developer.twitter.com/en/developer-

terms/agreement

Trump and members of his administration are respon-

sible for tweet content (Auxier and Golbeck, 2017),

thus, they do not represent the output of an individ-

ual, as they do an institution, and crucially; iii) cur-

rent interpretation of constitutional law in the United

States holds that post-election tweets from @realDon-

aldTrump constitute Presidential records, and require

preservation under the Presidential Records Act of

1978, and part of the public record.

This distinction between public and private is im-

portant in the context of POTUS’ deleted tweets.

(Maddock et al., 2015) identify that not only do le-

gal obligations require researchers to remove deleted

content, but so too do ethical obligations, as the act of

deleting a tweet indicates withdrawal of consent for

its use in research. Twitter’s Developer Agreement re-

quires that “all reasonable efforts to delete or modify”

deleted content as soon as possible, or within 24 hours

after being asked to do so by Twitter or the user

, but

(Meeks, 2018) argues that researchers may be able to

use these tweets if they have been sourced from a third

party, such as Politiwoops

, who have an agreement

in place with Twitter to archive and publish deleted

content. Furthermore, since content from the @re-

alDonaldTrump is subject to preservation orders, all

tweets, including any deleted ones, require preserva-

tion under the Presidential Records Act of 1978, and

form part of the public record. We follow (Meeks,

2018), (Abramson, 2017), and (Dawsey and Bender,

2017), and have included deleted tweets – sourced by

a third party (namely Politiwoops) – into our dataset.

4 METHODOLOGY

An initial survey of 64 existing tools led to the identi-

ﬁcation of ﬁve (MeaningCloud

, ParallelDots

, Re-

pustate

, RSentiment for R

, and SentiStrength

)

that met six selection criteria: i) ability to perform

English language sentiment analysis on tweets, ii) are

free to use, iii) do not require an existing application,

iv) use a pre-built lexicon, v) are able to ingest the

data collected for analysis, and vi) have no special

computational infrastructure requirements (Table 1).

https://developer.twitter.com/en/developer-

terms/policy

https://projects.propublica.org/politwoops/

http://www.meaningcloud.com

http://www.paralleldots.com

http://www.repustate.com

https://cran.r-project.org/web/packages/RSentiment/

index.html

http://sentistrength.wlv.ac.uk/

“It’s Modern Day Presidential! An Evaluation of the Effectiveness of Sentiment Analysis Tools on President Donald Trump’s Tweets”

645

Table 1: Tool selection criteria and required response.

Selection Criteria Required

Suitable for English?

Yes

Suitable for ngram analysis?

Yes

Free to use? Yes

API dependent? No

Pre-built lexicon? Yes

Able to ingest the dataset? Yes

Special computational requirements?

Data was sourced from Politiwoops

and

Factba.se

. They have a different collection rate,

thus minimising the risk of missing or deleted tweets.

Following data matching and a cross check against

archives of deleted tweets from @realDonaldTrump,

a consolidated dataset of 2,880 tweets was produced.

Manual sentiment annotation was completed by

three coders on the crowdsourcing platform Figure

Eight

. For quality assurance, we also manually an-

notated the tweets. Mismatches between the senti-

ment determined by the Figure Eight coders and us

were referred to a third-party volunteer arbiter.

5 DATASET

5.1 Existing Twitter Datasets

A review of publicly available Twitter datasets (based

on the work of (Saif et al., 2013) (Saif et al., 2012)

(Saif et al., 2016), and conﬁrmed and extended by

(Symeonidis et al., 2018)) informed the development

of the custom dataset used in the evaluation of the

tools. The focus was on attributes such as size, cre-

ation method, and classiﬁer categories. A total of 10

datasets (six of which were publicly available at the

time of writing) were identiﬁed. They were reviewed

to assess size, classiﬁers used (e.g. positive, negative

or neutral) and the creation workﬂow (see Table 2).

The Debate08 or Obama-McCain Debate (OMD)

dataset holds 3,238 tweets (Shamma et al., 2009)

that were manually annotated by three coders using

classiﬁers for positive, negative, mixed or other. It has

been utilized for testing supervised learning methods

by (Saif et al., 2012) and (Hu et al., 2013).

https://projects.propublica.org/politwoops/user/

realDonaldTrump

https://factba.se/topic/deleted-tweets

https://www.ﬁgure-eight.com, formerly known as

Crowdﬂower

Posted during a 2008 presidential debate between the

President Obama and challenger Senator McCain

The Stanford Semantic Twitter Sentiment (STS)

Datasets were among the ﬁrst of their kind. Ini-

tially built by (Go et al., 2009) to support a project

report, the dataset contains training and testing cor-

pora, both using three classiﬁcations (positive, nega-

tive and neutral). The 1.6 million tweets of the STS-

Training dataset were automatically annotated, whilst

the smaller STS-Test dataset was hand-coded.

The Health Care Reform (HCR) dataset

com-

prises of 2,516 tweets that contain the #hcr hashtag,

which relates to the efforts of the Obama administra-

tion to introduce the Affordable Care Act. A subset

of this corpus was manually annotated by the authors

with ﬁve labels (positive, negative, neutral, irrelevant,

and other). It is comprised of three sub-corpora for

use in model development, evaluation, and training.

The STS-Gold Dataset (constructed by (Saif et al.,

2013) from the Stanford datasets) comprises of 2,142

tweets, classiﬁed as positive, negative, neutral, mixed

and other. It was hand-coded by three annotators.

The Sentiment Strength

(SS) Twitter Dataset

(SS-Twitter) was developed by (Thelwall et al., 2010)

to evaluate the effectiveness of their lexicon-based

tool, SentiStrength. It was hand-annotated by three

annotators to assign a Likert-type numerical value

(both a positive (1 [not positive] to 5 [extremely pos-

itive]) and a negative (-1 [not negative] to -5 [ex-

tremely negative]) one) to each of the 4,242 tweets.

5.2 Dataset Used for Tool Evaluation

Our dataset was created by combining tweets from the

Trump Twitter Archive

, and those directly extracted

from Twitter using a third party tool, FireAnt

. There

were two challenges to the process of data selection

and acquisition: i) access via the public API is con-

strained to the last seven days, and ii) deleted tweets.

Our dataset comprises of 2,981 tweets from elec-

tion day 2016 (8 November) to one year post the Pres-

idential inauguration (20 January, 2017). This period

was chosen as it is slightly longer than a full calen-

dar year of events in a Presidential diary. This pro-

vided the opportunity to include regular events in-

cluding international assemblies such as the Group

of Seven (G7) summit, holidays and recurrent natural

cycles such as the US hurricane season. The period

also covers a broad range of unanticipated national

and international events, such as domestic protests

https://bitbucket.org/speriosu/updown/src/

1deb8fe45f603a61d723cc9b987ae4f36cbe6b16/data/

hcr/?at=default

http://sentistrength.wlv.ac.uk/#About

http://www.trumptwitterarchive.com/

http://www.laurenceanthony.net/software/ﬁreant/

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

646

Table 2: Twitter dataset comparison.

Dataset

Total

Tweets

Coding

Coders Use

Classiﬁers used

positive neutral negative mixed other relevant

STS-Test 489 manual 3+ test yes yes yes no no yes

STS-Gold 2,124 manual 3 test yes yes yes yes yes yes

STS-

Training

1.6M auto n/a train yes yes yes no no yes

HCR 2,516 manual ?

all

yes yes yes no yes no

Debate0

8 (OMD)

3,238 manual 3+ test yes no yes yes yes yes

SS-Twitter 4,242 manual 3+ test yes yes yes no no yes

(Chenoweth and Pressman, 2017) and the testing of

nuclear weapons by North Korea (Fiﬁeld, 2017).

6 TOOL EVALUATION

6.1 T1: MeaningCloud

MeaningCloud

provides topic extraction, text clas-

siﬁcation, sentiment analysis and summarization

through Excel add-ons, plug-ins, and cloud-based

APIs. It uses a pre-built dictionary to determine senti-

ment based on (uniquely amongst the evaluated tools)

four possibilities: positive, negative, neutral, or none,

but was unable to classify 13.50% of the data.

MeaningCloud was the second most accurate tool

in correctly identifying positive tweets, with a suc-

cess of 72.15% (see T1 in Table 3). It accurately de-

termined negative tweets 50.54% of the time. Many

of those mis-identiﬁed as positive relate to attacks

on main stream media (“Drain the Swamp should be

changed to Drain the Sewer - it’s actually much worse

than anyone ever thought, and it begins with the Fake

News!”

) or the investigation into Russian inﬂu-

ence on the election (“This is the single greatest witch

hunt of a politician in American history!”

). Posi-

tive tweets mis-classiﬁed as negative have mixed lan-

guage, mention the economy or the inauguration.

Most of the neutral tweets that were mis-identiﬁed

as negative express condolences or remembrance

(“National Pearl Harbor Remembrance Day - “A day

that will live in infamy!” December 7, 1941”

while those neutral tweets that were misidentiﬁed as

positive were concerned with natural disasters.

http://www.meaningcloud.com

https://twitter.com/realdonaldtrump/status/

889435104841523201

https://twitter.com/realdonaldtrump/status/

865173176854204416

https://twitter.com/realdonaldtrump/status/

938786402992578560

Table 3: Accuracy of all ﬁve sentiment analysis tools.

Negative Neutral Positive Total

T1 50.54% 5.00% 72.15% 62.95%

T2 64.70% 52.50% 64.50% 64.41%

T3 64.07% 35.00% 70.48% 67.53%

T4 37.66% 45.00% 73.88% 59.62%

T5 60.07% 47.50% 58.52% 58.96%

6.2 T2: ParallelDots

ParallelDots

provides machine learning services for

text analytics. They offer Microsoft Excel, Google

Sheets add-ins and cloud-based APIs. These enable

use of functionality including keyword extractions,

named entity recognition and sentiment and emotion

analysis, using a pre-built lexicon to classify each

tweet as either positive, negative or neutral.

Overall accuracy was 64.41%, correctly identi-

fying 64.50% of positive, 64.70% of negative and

52.50% of neutral tweets. Self-promotive tweets

proved an issue, 41.65% of which were misidentiﬁed.

There was some consistency among the categories

it misidentiﬁed: tweets regarding players protesting

by kneeling for the national anthem as positive even

though they are negative in sentiment (“The NFL is

now thinking about a new idea - keeping teams in

the Locker Room during the National Anthem next

season. That’s almost as bad as kneeling! When

will the highly paid Commissioner ﬁnally get tough

and smart? This issue is killing your league!.....”

as were tweets regarding immigration and the border

wall (“The judge opens up our country to potential

terrorists and others that do not have our best interests

at heart. Bad people are very happy!”

http://www.paralleldots.com

https://twitter.com/realdonaldtrump/status/

933285973277868032

https://twitter.com/realdonaldtrump/status/

828042506851934209

“It’s Modern Day Presidential! An Evaluation of the Effectiveness of Sentiment Analysis Tools on President Donald Trump’s Tweets”

647

6.3 T3: Repustate

Repustate

provides server-based software, APIs and

added functionality to Microsoft Excel. Results are

provided numerically from 1 to -1: the greater the

number, the more positive the tweet, the lower a neg-

ative number, the more negative. A score of zero in-

dicates a neutral tweet.

This tool had the highest overall accuracy of

67.53%. It ranked third in identifying positive tweets

(70.10%). Self-promotional tweets were the most

challenging to identify, mis-classiﬁed in 32.65% of

cases. It was the second most accurate in identifying

negative tweets (64.07%). Close reading did not iden-

tify the use of positive language, nor tweets that are

purely informational in those categorized neutral. En-

tries such as “Obama Administration ofﬁcial said they

“choked” when it came to acting on Russian meddling

of election. They didn’t want to hurt Hillary?”

?”

are classiﬁed as positive, whilst “The judge opens up

our country to potential terrorists and others that do

not have our best interests at heart. Bad people are

very happy!”

to be identiﬁed as neutral, possibly

since they combine both negative and positive senti-

ments that effectively cancel each other out.

Only 35.00% of neutral tweets were classiﬁed

correctly. These tweets included words such as

“harm”, “destructive” and “terrorism” (“Today, I

signed the Global War on Terrorism War Memo-

rial Act (#HR873.) The bill authorizes....cont

https://t.co/c3zIkdtowc https://t.co/re6n0MS0cj”

illustrating that using only words tagged with pre-

determined sentiment can be problematic.

6.4 T4: RSentiment Package for R

RSentiment

is an Open Source package for use with

R. It provides a range of queries for sentiment de-

termination through a pre-built lexicon and can also

ingest a custom list of single words. It uses Part-of-

Speech (POS) tagging to identify nouns, verbs, ad-

jective, adverbs, etc., inferring context from the order

of these words and incorporating this in the sentiment

score. Additional calculations are performed to iden-

tify negation and sarcasm within the sentence.

http://www.repustate.com

https://twitter.com/realdonaldtrump/status/

878715504063643648

https://twitter.com/realdonaldtrump/status/

828042506851934209

https://twitter.com/realdonaldtrump/status/

898718902200418306

https://cran.r-project.org/web/packages/RSentiment/

index.html

For this evaluation the calculate sentiment func-

tion was used. This provides results across ﬁve cat-

egories: very positive, positive, neutral, negative and

very negative (Bose, 2018): these results were con-

verted to three categories (positive, negative and neu-

tral) for consistency with the other evaluated tools.

The heavy draw on system resources complicated

the use of this tool, with memory errors consistently

encountered when processing more than 200 tweets

at a time. The dataset had to be split into 15 sepa-

rate ﬁles with a maximum of 200 entries. Each ﬁle

was imported, analysed, and exported separately, and

results consolidated on completion.

This tool had the second most inaccurate result

with a total accuracy of 59.62%. It was most inaccu-

rate in identifying negative tweets (only 37.66% were

correctly classiﬁed). Analysis of the words used in

these tweets does not provide a cause for this, given

the tool identiﬁed tweets such as “The @nytimes sent

a letter to their subscribers apologizing for their BAD

coverage of me. I wonder if it will change - doubt

it?”

as positive, and “Just tried watching Satur-

day Night Live - unwatchable! Totally biased, not

funny and the Baldwin impersonation just can’t get

any worse. Sad”

as neutral.

RSentiment was unable to correctly identify a ma-

jority of neutral tweets scoring 45.00%. As with pre-

vious tools, it is easy to identify words such as “holo-

caust” and “destructive” that cause negative classiﬁ-

cations, however, it is unclear which words in tweets

such as “RT @NWSHouston: Historic ﬂooding is still

ongoing across the area. If evacuated, please DO

NOT return home until authorities indicate it i...”

would lead to a positive classiﬁcation.

RSentiment had an accuracy of 73.88%. Self-

promotive tweets were the most difﬁcult to classify,

accounting for more than half of the misidentiﬁed

tweets. It is clear how particular words have driven a

negative classiﬁcation in some instances (“Despite the

phony Witch Hunt going on in America, the economic

& jobs numbers are great. Regulations way down,

jobs and enthusiasm way up!”

),but not in others,

e.g.“A great great honor to welcome & recognize the

National Teacher of the Year, as well as the Teacher of

the Year fro. . . https://t.co/pUGl7RDoVX”

. as they

contain several terms generally thought of as positive.

https://twitter.com/realdonaldtrump/status/

797812048805695488

https://twitter.com/realdonaldtrump/status/

805278955150471168

https://twitter.com/realdonaldtrump/status/

902491685720076288

https://twitter.com/realdonaldtrump/status/

875698062030778368

https://twitter.com/realdonaldtrump/status/

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

648

6.5 T5: SentiStrength

SentiStrength

is an opinion-mining software (Sen-

tiStrength, 2018). It uses a pre-built dictionary of

words tagged positive, negative and neutral to deter-

mine sentiment. There are four options for output:

• Dual: where both a positive (from 1 - 5) and neg-

ative (from -1 to -5) score is given for each tweet

• Binary: positive and negative

• Trinary: positive, negative and neutral

• Scale: sentiment is rated -4 to +4, representing

negative to positive results respectively.

The trinary (positive, negative and neutral) setting

was initially chosen for analysis to maintain consis-

tency. This setting did not work as expected with

results being presented as dual. Testing of the Bi-

nary and Scale outputs revealed the same issue. To

convert the dual score to comparable sentiment cat-

egories, the positive and negative scores were added

together, and a classiﬁcation generated. For example,

if a tweet scored +3 for positive sentiment, and -1 for

negative sentiment, it’s overall score was calculated

as +2, giving it an overall positive classiﬁcation.

The relative ease-of-use and fast processing time

(<5 seconds) of this tool are negated by it being

the least accurate for overall tweet classiﬁcation, at

58.96%. It was the least accurate of all tools in identi-

fying positive tweets (58.52%), mis-identifying self-

promotional tweets as positive. Topics such as foreign

affairs and the economy were frequently misidenti-

ﬁed, but in no observable pattern.

Tweets such as “China has been taking out mas-

sive amounts of money & wealth from the U.S. in to-

tally one-sided trade, but won’t help with North Ko-

rea. Nice!”

using mixed language and terms were

mis-classiﬁed as positive. It is not clear why some

very negative records were classiﬁed as neutral (“The

Fake News media is ofﬁcially out of control. They

will do or say anything in order to get attention - never

been a time like this!”

Sentistrength ranked second for identifying neu-

tral tweets at 47.05% accuracy. Similar to other

tools, it could not identify neutral tweets related to

remembrance or disasters where typically negative

words such as battleﬁeld or storm were used, and mis-

classiﬁed some records as positive when terms includ-

ing “strengthens”, “bless” or “pioneer” were present.

857360510534209536

http://sentistrength.wlv.ac.uk/

https://twitter.com/realdonaldtrump/status/

816068355555815424

https://twitter.com/realdonaldtrump/status/

86008733451941478

7 CONCLUSION

The tweets of POTUS Donald Trump have provided

his social media audience with rich and varied mate-

rials. The ﬁrst stage to understanding the true nature

of this content has been the evaluation of ﬁve exist-

ing sentiment analysis tools, selected from an origi-

nal list of over 60 possibilities. The results from our

evaluative testing show that MeaningCloud accurately

classiﬁed 62.95% of the tweets in our purpose-built

dataset; ParallelDots was 64.41% accurate; Repustate

was 67.53%; RSentiment was 59.62% accurate; and

SentiStrength was 58.96% accurate (Table 3).

There is consistency in the results: Meaning-

Cloud, Repustate and RSentiment were all more ac-

curate at determining positive sentiment, and all ﬁve

tools were least able to identify neutrality. Tweets

of self-promotion were the most problematic for

all tools, possibly since these typically contain a

high level of mixed language, often attacking an

opponent then promoting Trump or his administra-

tion’s achievements. Neutral tweets were often mis-

classiﬁed due to their inclusion of language related

to death or natural disasters. Determining sentiment

using a pre-tagged dictionary is problematic, as the

tools fail to recognise the context in which these terms

are used, and cannot be guaranteed to distinguish be-

tween a statement of fact and a negative expression.

In addition to providing an evaluative comparison

of the ﬁve tools, the paper provides a summary of

shared features of a number of existing datasets con-

taining Twitter content, which are freely accessible.

The project as it currently stands has relied heavily on

such a third-party archive of tweets as well as a third-

party tool for gathering tweets, but there has been lit-

tle scope for critically evaluating either the third-party

dataset or the tweet-gathering tool, beyond express-

ing a concern regarding missing or deleted tweets, and

noting an issue with timezones: to simplify the task,

we made a baseline assumption that all of Trump’s

tweets were posted in Eastern standard time, although

the documentation for the third-party archive does in-

clude the caveat that they cannot be certain when he

was tweeting from other parts of the country or world.

Another assumption made in the process of this

analysis that the @realDonaldTrump Twitter account

captures the idiosyncratic voice of a single user. It is

however possible, that both Trump and his political

team use the account, with some evidence that, for

example, prior to the election, there was a difference

in the voice dependent on the type of equipment used

(iPhone or Android); after the election, we could as-

sume that tweets posted between the hours of 10pm

- 9am are likely Trump personally, but cannot assert

“It’s Modern Day Presidential! An Evaluation of the Effectiveness of Sentiment Analysis Tools on President Donald Trump’s Tweets”

649

the same with equal conﬁdence during ofﬁce hours,

for example. Future work will concentrate on anal-

yses, which may be in a position to use natural lan-

guage processing and sentiment analysis to investi-

gate whether one or more author-voices can be de-

tected in the data.

ACKNOWLEDGEMENTS

The work carried out for this paper formed part of

an unpublished Master’s thesis at the Australian Na-

tional University, which, in turn, builds on an un-

published preliminary investigation submitted for the

SOCR8006 Online Research Methods course taught

by Associate Professor Robert Ackland at the Aus-

tralian National University. The authors would like to

acknowledge and thank all their colleagues who have

contributed to any and all of these pieces, including

Professor Les Carr, University of Southampton, and

Dr Jenny Davis, Australian National University, who

acted as examiners for the thesis.

REFERENCES

Abramson, A. (2017). President Trump Just Deleted

Tweets About Luther Strange. Is He Allowed to

Do That? http://time.com/4958877/donald-trump-

deletes-tweets-legal-implications/.

Ahmadian, S., Azashahi, S., and Palhus, D. L. (2016).

Explaining donald trump via communication style:

Grandiosity, informality, and dynamism. Personality

and Individual Differences, 107:49–53.

Ahmed, S., Jaidka, K., and Cho, J. (2016). The 2014 in-

dian elections on twitter: A comparison of campaign

strategies of political parties. Telematics and Infor-

matics, 33:1071–1087.

Auxier, B. and Golbeck, J. (2017). The president on twit-

ter: A characterization study of @realdonaldtrump. In

Ciampaglia, G., Mashhadi, A., and Yasseri, T., edi-

tors, Social Informatics. SocInfo 2017. Lecture Notes

in Computer Science, volume 10539, pages 377–390.

Bonnilla, Y. and Rosa, J. (2015). #fergusonl digital protest,

hashtag ethnography, and the racial politics of social

media in the united states. American Ethnologist,

42(1):4–17.

Bose, S. (2018). Rsentiment: Analyse sentiment of enlgish

sentences. r package version 2.2.2.

boyd, d. and Crawford, K. (2012). Critical questions for

big data. Information, Communication and Society,

15(5):662–679.

Chenoweth, E. and Pressman, J. (2017). Last month,

83% of U.S. protests were against Trump.

https://www.washingtonpost.com/news/monkey-

cage/wp/2017/09/25/charlottesville-and-its-

aftermath-brought-out-many-protesters-in-august-

but-still-more-were-against-trump-and-his-

policies/?noredirect=on&utm term=.5ea64cd2742f.

Davidov, D., Tsur, O., and Rappoport, A. (2010). Enhanced

sentiment learning using twitter hashtags and smileys.

In Proceedings of the 23rd international conference

on computational linguistics: posters, pages 241–249.

Association for Computational Linguistics.

Dawsey, J. and Bender, B. (2017). National archives warned

trump white house to preserve documents.

Enli, G. (2017). Twitter as arena for the authentic outsider:

Exploring the social media campaigns of trump and

clinton in the 2016 us presidential election. European

Journal of Communication, 32:50–61.

Fiesler, C. and Proferes, N. (2018). “participant” percep-

tions of twitter research ethics. Social Media+ Soci-

ety, 4(1):2056305118763366.

Fiﬁeld, A. (2017). In latest test, North Korea det-

onates its most powerful nuclear device yet.

https://www.washingtonpost.com/world/north-korea-

apparently-conducts-another-nuclear-test-south-

korea-says/2017/09/03/7bce3ff6-905b-11e7-8df5-

c2e5cf46c1e2 story.html?utm term=.462a0a2b8411.

Francia, P. L. (2017). Free media and twitter in the 2016

presidential election: The unconvensional campaign

of donald trump. Social Science Computer Review,

page 0894439317730302.

Gainous, J. and Wagner, K. M. (2014). Politics, Tweeting

to Power: The Social Media Revoluion in American

Politics. Oxford University Press, New York.

Go, a., Bhayani, R., and Huang, L. (2009). Twitter senti-

ment classiﬁcation using distant supervision. CS224N

Project Report, Stanford, 1(12).

Graham, T., Jackson, D., and Broersma, M. (2014). New

platform, old habits? candidates’ use of twitter during

the 2010 british and dutch general election campaigns.

New Media and Society, 18(5):765–783.

Gross, J. H. and Johnson, K. T. (2016). Twitter taunts and

tirades: Negative campaigning in the age of trump.

PS: Political Science & Politics, 49(4 (Elections in

Focus)):748–754.

Hoffman, C. P., Suphan, A., and Meckel, M. (2016). The

impact of use motives on politicians’ social media

adoption. Journal of Information Technoloyg and Pol-

itics, 13(3).

Hu, X., Tang, L., Tang, J., and Liu, H. (2013). Exploiting

social relations for sentiment analysis in microblog-

ging. In Proceedings of the sixth ACM international

conference on Web search and data mining, pages

537–546. ACM.

Karpf, D. (2017). Digital politics after trump. An-

nals of the International Communication Association,

41(2):198–207.

Kouloumpis, E., Wilson, T., and Moore, J. D. (2011). Twit-

ter sentiment analysis: The good the bad and the omg!

In Icswm, volume 11, page 164.

Larsson, A. O. and Kalsnes, B. (2014). ’of course we are

on facebook’: Use and non-use of social media among

swedish and norwegian politicians. European Journal

of Communication, 29(6):653–667.

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

650

Maddock, J., Starbird, K., and Mason, R. M. (2015). Us-

ing historical twitter data for research: Ethical chal-

lenges of tweet deletions. In CSCW 2015 Workshop

on Ethics for Studying Sociotechnical Systems in a Big

Data World. ACM.

McNair, B. (2018). From control to chaos, and back again.

Journalism Studies, 19(4):499–511.

Meeks, L. (2018). Tweeted, deleted: theoretical, method-

ological, and ethical considerations for examining

politicians’ deleted tweets. Information, Communi-

cation & Society, 21(1):1–13.

Mukherjee, S. and Bala, P. K. (2017). Sarcasm detection

in microblogs using na

ıve bayes and fuzzy clustering.

Technology in Society, 48:19–27.

Nunan, D. and Yenicioglu, B. (2013). Informed, unin-

formed and participative consent in social media re-

search. International Journal of Market Research,

55(6):791–808.

Oh, C. and Kumar, S. (2017). How trump won: The role

of social media sentiment in political elections. In Pa-

ciﬁc Asia Conference on Information Systems (PACIS

2017), volume 17, page 48.

Ott, B. L. (2016). The age of twitter: Donald j. trump and

the politics of debasement. Critical Studies in Media

Communication, 34(1):59–68.

Pak, A. and Paroubek, P. (2010). Twitter as a corpus for sen-

timent analysis and opinion mining. In LREc, volume

10 No 2010, pages 1320–1326.

Park, S., Park, J. Y., Lim, Y. S., and Park, H. W. (2015).

Expanding the presidential debate by tweeting: The

2012 presidential debate in south korea. Telematics

and Informatics.

Reyes, A., Rosso, P., and Buscaldi, D. (2012). From hu-

mor recognition to irony detection: The ﬁgurative lan-

guage of social media. Data and Knowledge Engi-

neering, 74:1–12.

Saif, H., Fernandez, M., He, Y., and Alani, H. (2013). Eval-

uation datasets for twitter sentiment analysis: A sur-

vey and a new dataset, the sts-gold. In 1st Intl Work-

shop on Emotion and Sentiment in Social and Ex-

pressive Media: Approaches and Perspectives from AI

(ESSEM 2013), pages 9–21.

Saif, H., He, Y., and Alani, H. (2012). Semantic sentiment

analysis of twitter. In Proceedings of the 11th Intl con-

ference on The Semantic Web, pages 508–524.

Saif, H., He, Y., Fernandez, M., and Harith, A. (2016). Con-

textual semantics for sentiment analysis of twitter. In-

formation Processing and Management, 52(1):5–19.

Shamma, D. A., Kennedy, L., and Churchill, E. F. (2009).

Obama McCain Debate Dataset. https://bitbucket.

org/speriosu/updown/wiki/Getting Started.

Stieglitz, S. and Dang-Xuan, L. (2013). Social media

and political communication: a social media analyt-

ics framework. Social Network Analysis and Mining,

3(4):1277–1291.

Stolee, G. and Canton, S. (2018). Twitter, trump, and the

base: A shift to a new form of presidential talk? Signs

and Society, 6(1):147–165.

Strandberg, K. (2012). A social media revolution or just a

case of history repeating itself? the use of social me-

dia in the 2011 ﬁnnish parliamentary elections. New

Media and Society, 15(8):1329–1347.

Symeonidis, S., Effrosynidis, D., and Arampatzis, A.

(2018). A comparative evaluation of pre-processing

techniques and their interactions for twitter sentiment

analysis. Expert Systems With Applications, 110:298–

310.

Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., and Kap-

pas, A. (2010). Sentiment strength detection in short

informal text. Journal of the American Society for

Information Science and Technology, 61(12):2544–

2558.

Wang, H., Can, Can, D., Kazemzadeh, A., Bar, F., and

Narayanan, S. (2012). A system for real-time twit-

ter sentiment analysis of 2012 us presidential election

cycle. In ACL 2012 System Demonstrations, pages

115–120.

Wang, Y., Li, Y., and Luo, J. (2016). Deciphering the 2016

us presidential campaign in the twitter sphere: A com-

parison of the trumpists and clintonists. In ICWSM,

pages 723–726.

Webb, H., Jirotka, M., Stahl, B. C., Housley, W., Edwards,

A., Williams, M., Procter, R., Rana, O., and Burnap,

P. (2017). The ethical challenges of publishing twitter

data for research dissemination. In Proc of the ACM

on Web Science Conference, pages 339–348. ACM.

Woolley, S. C. and Guilbeault, D. R. (2017). Computational

propaganda in the united states of america: Manufac-

turing consensus online. Computational Propaganda

Research Project, page 22.

“It’s Modern Day Presidential! An Evaluation of the Effectiveness of Sentiment Analysis Tools on President Donald Trump’s Tweets”

651