A tweet from the Russian Troll dataset was
assigned the label 1, while a tweet from the Non-Troll
dataset was assigned the label 0. Table 1 shows an
example of four tweets taken from the dataset, with
the label and tweet text columns, and Table 2 shows
examples of right and left political bias trolls from the
Russian Troll dataset.
In our experiments, we divided datasets into
training and test data using an 80:20 breakdown.
Table 2: Example of Left and Right biased Trolls.
Right
a. You do realize if democrats stop shooting people,
gun violence would drop by 90%
b. US sailor gets 1 year of prison for being reckless
w/ 6 photos of sub Hillary gets away w/ 33k
emails.. https://t.co/jmPjfPCRK4
Left
a. 1 dat. 4 shootings. It’s Trump’s Birthday – Tavis
Airforce Base –
b. 1 black president out of 45 white ones is the
exception that proves the rule. The rule is racism.
And then Trump came next.
3.1 Support Vector Machine Classifier
SVM (Support Vector Machines) is a popular
supervised machine learning technique (Vapnik,
1995). The Support Vector Machine conceptually
implements the following idea: input vectors are
(non-linearly) mapped to a very high-dimensional
feature space. SVM has been proven effective for
many text categorization tasks.
In SVMs, we try to find a hyperplane in an N-
dimensional space that can be used to separate the
data points with two different classifications.
“Support Vectors” define those points in the data set
that affect the position of the hyperplane. These are
the data points nearest to the hyperplane on both
sides. Usually, there are several possible hyperplanes
that can be used to classify a dataset into two different
classes. The main objective of the SVM algorithm is
to find a hyperplane with the maximum margin
between the data points. This ensures that when this
model is used to classify new data points, it is likely
to classify them correctly. It requires input data
represented as multi-dimensional vectors.
Data Preprocessing and Representation: Besides
the steps described in Section 2, we deleted emoticons
from our dataset, since we are not taking them into
consideration for building the model. For methods of
using emoticons in sentiment analysis, see, e.g.,
(Bakliwal, 2012) and our previous work (Ji, 2015).
As the next step, we applied stemming to the
dataset, using the Porter Stemmer (Porter, 2006).
To construct one input data model, we used a
Term Frequency — Inverse Document Frequency (tf-
idf) vectorizer to convert the raw text data into matrix
features. By combining tf and idf, we computed a tf-
idf score for every word in each document in the
corpus. This score was used to estimate the
significance of each word for a document, which
helps with classifying tweets.
SVM Classification Model: We built the SVM
model using the FiveThirtyEight dataset. The stored
model can be called later for classification of new
data. We used the SVM scikit-learn implementation
named SVC (Support Vector Classification). It is an
implementation based on libsvm (Chang, 2001).
In this text categorization problem, we made use
of a linear SVM classifier with the regularization
parameter, C = 0.1. The regularization parameter is
used to control the trade-off between
misclassifications and efficiency. The higher C is, the
fewer misclassifications are allowed, but training gets
slower. In our case, since our regularization
parameter is very small, misclassifications are
allowed, but training is relatively faster. As this
dataset is very large, this was necessary.
We use a Radial Basis Function (RBF) kernel for
our SVM model as the set of unique words in our data
set presents a high dimensional vector space (Albon,
2017).
3.2 Neural Network Classifier with
One-hot Encoding
Data Representation: There are two popular ways of
representing natural language sentences: vector
embeddings and one-hot matrices. One-hot matrices
contain no linguistic information. They indicate
whether words occur in a document (or a sentence)
but suggest nothing about its frequency, or itsr
relationships to other words. The creation of one-hot
matrices begins with tokenizing the sentence, that is,
breaking it into words. Then we created a lookup
dictionary of all the unique words/tokens, which need
not have a count or an order. Essentially, every word
is presented by a position/index in a very long vector.
The vector component at that position is set to 1 if the
word appears. All other components in the vector are
set to 0. For example, in a dictionary that contains
only seven words, the first word would be represented
by [1, 0, 0, 0, 0, 0, 0], the second by [0, 1, 0, 0, 0, 0,
0], etc. Each vector is of the length of the dictionary
(in our case 3000 words), and vectors are stored as
Python arrays. A whole sentence needs to be
represented by a 2-dimensional matrix.
Neural Network Classifier: For comparison with
SVM, we first built a sequential classifier, which is a
simple neural network model that consists of a stack
of hidden layers that are executed in a specific order.