

Fake News Detection using Support Vector Machine

Alpna Patel

, Arvind Kumar Tiwari

, S. S. Ahmad

Department of Computer Science & Engineering, Kamla Nehru Institute of Technology, Sultanpur, India

KL Ghurair University, Dubai, UAE

Keywords: Natural Language Processing; Fake News Detection; Machine Learning; SVM.

Abstract: Social media is a rich source of information now days. If we look into the dark side of social media, we

observed that fake news is one of the serious issues of society. Fake news is being used to spread false

information over social media platforms. Fake news detection is the substantial area of research in the field

of Natural Language Processing. Thispaper gives the comparative study of well-known machine learning

approaches like Naïve Bayes, SVM, Decision tree classifier, Random Forest, Multinomial NB and Logistic

Regression. The experimental result shows that SVM classifier outperforms the other approaches and

achieved accuracy of 94.93%.

1 INTRODUCTION

Natural Language Processing(NLP) is a subfield of

Artificial Intelligence that concerns about machines

and humans interaction. If we look into the

applications of NLP, there are numerous

applications such as chatboats, social media

monitoring, language translator, sentiment analysis,

fake news detection, voice assistant, grammar

checker and many more.Fake News detection is one

of the substantial applications of Natural Language

Processing.

In a growing phase of social networking sites,

some of the dark sides come into the picture. Fake

news is one of the dark sides of social media. It

became serious issue of society nowadays. Fake

news is about to spread false information over

various social media platforms like facebook,

twitter, instagram, whatsapp etc. In this paper, we

are presenting a survey on fake news detection by

using machine learning classifiers.

2 RELATED WORK

In order to do survey on fake news detection, many

researchers have given their review on machine

learning models in ongoing years. This section

briefly elaborates on the numerous researches,

related to fake news detection by using machine

learning classifiers.

Fake news stands for misleading the information

that comes from different sources. There are various

machine learning algorithms that achieve better

accuracy. In order to do analysis on fake news

detection authors have presented different data

mining perspective (Ruchansky N, 2017). A hybrid

model has been proposed for fake news detection

(Wang Y, 2018).

The authors have presented the novel approach

that representsmulti model fake news detection. This

model can derive event invariant features ( Zhou X,

2019). Xinyi Zhou et. al.has shared fundamental

theories, detection strategies and challenges about

fake news (Granik M, 2017). The authors have

achieved accuracy up to the mark by using naïve

bayes classifier. The authors has shared their views

on the role of social context for fake news detection.

In this paper they discussed the social context that

has been used to spread false information (Shu K,

2019). Social media is a huge source of information

now days. The author F Montiet. al. have proposed a

novel approach for detection of fake news using

deep learning (Monti F, 2019).The authors have

presented a deep neural network method for

detection of fake news. In this paper, they have

presented the three level hierarchal attention

network for accurate and fast detection of fake news

(Singhania S, 2017). The authors have proposed a

novel approach named multi model variational auto

Patel, A., Tiwari, A. and Ahmad, S.

Fake News Detection using Suppor t Vector Machine.

DOI: 10.5220/0010562000003161

In Proceedings of the 3rd International Conference on Advanced Computing and Software Engineering (ICACSE 2021), pages 34-38

ISBN: 978-989-758-544-9



encoder for fake news detection. They have used

different deep learning techniques to achieve the

better result (Khattar D, 2019).Panet. al have given

the survey on fake new detection using knowledge

graphs (Ravi K,2015).

3 METHODOLOGY

In this paper fake news detection using machine

learning approach such as data collection, data pre-

processing and so on. Data preprocessing contains

different techniques like cleaning, tokenization etc.

3.1 Machine Learning based Approach

This approach is used to predict fake news detection

that is based on trained data sets as well as test

datasets. It uses different Machine learning

algorithms to train the dataset and these trained

models are used for specific purposes. There are two

learning approaches used for training model named

as supervised learning method and unsupervised

learning method (Jeff Z., 2018).

3.1.1 Supervised Learning Approach

This approach is used when there is finite number of

classes defined named as positive or negative. It uses

labelled dataset for training purpose. Decision tree

algorithm, Artificial neural network, Random forest,

Regression, Logistic Regression, Support Vector

Machine, Nearest Neighbour, Naïve Bayes, are the

several supervised learning algorithms.

3.1.2 Unsupervised Learning Method

This method does not require labelled datasets and it

is work on document- level SA. The aim is to

identify semantic orientation in given phrase.

Partitioning clustering is the unsupervised learning

algorithm.

3.2 Data Collection

This is very initial and important phase in order to

perform fake news detection. Now a day, there are

various freely available data sources that are public

to everyone such as twitter dataset for analysis.

Apart from this, data can be acquire from different

world wide web, social media sites like twitter,

facebook, instagram and online blogging sites and

many more. These websites contains large amount

of data that is used to perform analysis. This dataset

contains two parts of data i.e fake news and real

news. This dataset includes 21418 numbers of data

on true news and 23503 numbers of data on fake

news from the kaggle website (Khattar D, 2019).

This datasets used for the detection of fake news by

using different machine learning approach.

3.3 Data Pre-processing

Data preprocessing method includes different

essential phases such as data cleaning, data

formatting and many more. The data sources contain

raw information that is preprocessed by applying

some data formatting and cleaning process (Shu K,

2017). There are some preprocessing techniques

available named as tokenization, stemming, feature

extraction, POS (part of speech) tagging, stop word

removal and so on. In this research paper, we used

preprocessing techniques for cleaning dataset. The

detail information is following:

3.3.1 Tokenization

It is the procedure of breaking the sentences into

phrases, symbols, words and other meaningful

tokens. This process is done by applying different

open source tools such as Natural Language

Processing Tokenizers.

3.3.2 Stemming

The sentence or document contains different form of

words like organize, organizing and organizes;

stemming is the procedure of reducing this kind of

word which is in derivationally related form.

3.3.3 Stop Word Removal

The sentence contains stop words. Stop word can be

defined as ‘a’ and ‘the’ in article, ‘he’, ’they’, ’it’ in

pronouns are stop words that leads the complexity in

the process of sentiment analysis. The process of

removing this kind of stop words are stop word

removal process.

3.3.4 Feature Extraction

This procedure is related to extract the most relevant

feature from text to perform sentiment analysis task.

Feature extraction comes under the classification

task. We select different feature from text and train

the different models by using classification methods.

Numerical feature and binary feature are the feature

vector categories that show the frequency

occurrences. Several texts feature is given below:

Fake News Detection using Support Vector Machine



N-grams: It shows the frequently occurred letters or

words in given text. It is categorized into unigram

(one- word), bigram (two- words) and trigram (more

than two-words).

Stop words: This is other text feature which is

frequently occurred in texts. For example a, the, an,

beside, near, he, she, it, they and so in.

Part of Speech tagging: This term is related to parts

of speech named as adjective, noun, pronoun,

adverb, verb and many more and in a text it holds

the maximum sentiment.

Negation: Negation word shows the negative word

in the text and it is related ‘not’. Some sentences

hold the positive meaning but the negation word

inverts the meaning of whole sentence. For example

‘Biryani is not good’ in this sentence good holds the

positive meaning but not inverts the meaning of

whole sentence.

Table 1: Environmental Setup

Harware and Software specification Configuration

Processor Intel(R) Core (TM) i3-5005U CPU @ 2.00 GHz

Random Access Memory 4.00 GB

System Type 64 bit

Operating System Windows 7

Development Environment

Anaconda 2.0.0

Keras

Table 2: Confusion Matrix

Label 1 (Predicted) Label 2 (Predicted)

Label 1 (Actual) Tn Fp

Label 2 (Actual) Fn Tp

Table 3: Accuracy Comparison of Machine Learning Approach for Fake News Detection

S.No. Models Accuracy(%) Precision(%) Recall(%) F-1 Score(%)

1. NB 88.58 93.99 93.26 93.62

Decision Tree

Classifie

89.70 89.90 90.79 90.34

3. Random Forest 92.37 92.38 93.33 92.85

4. Multinomial NB 92.98 93 92 93

Logistic

Regression

94.92 92.99 95.61 94.28

6. SVM 94.93 93.98 96.04 94.99

3.4 Support Vector Machine

Support Vector Machine is a supervised learning

method. It is a classification as well as regression

algorithm that is used to find a hyperplane in an N-

dimensional space (n is number of features you

have). SVM is based on discrimination. Support

vectors represent datapoints that are closet to

hyperplane.

4 RESULTS AND DISCUSSION

This section provide the briefly information about

the experimental setup and measures the

performance of machine learning models. The

following subsection represents detailed overview of

experimental setup, confusion matrix, etc.

ICACSE 2021 - International Conference on Advanced Computing and Software Engineering



4.1 Environmental Setup and

Parameter Setting

Anaconda is a package provider for machine

learning models by using python language.

Tensorflow is the framework that provides the

environment for machine learning models. In this

survey paper, we have used python version 3.6.5,

jupyter notebook and keras for implementing

machine learning models for analysis. Keras is the

higher level API that is use tensorflow in backend

and it is used for sequential modeling. The detail has

given in below Table I.

4.2 Performance Measure

In order to evaluate the performance of the machine

learning model, a confusion matrix has been used

that contains some parameters such as Tp as true

positive, Tn as true negative, Fp as false positive,

and Fn as false negative on test data. The confusion

matrix is given in Table II.

The formula for calculating accuracy is given

below:

Accuracy=

்௣ା்௡

்௣ା்௡ାி௣ାி௡

X 100%

The parameter accuracy is used to validate machine

learning model by using the test set and validate set.

The Table III presents the comparative study of

machine learning classifier.

The table III shows the comparative study of

machine learning approaches for fake news

detection. It contains the result analysis in the form

of accuracy, precision, recall and F-1 score. If we

look into the analysis process, we found that SVM

performs better compare to other approaches. This

analysis shows that SVM achieve the accuracy

94.93%. The below chart shows the graphical

representation of result analysis.

Figure 4.1: Summary of Machine Learning Approaches for Fake News detection

5 CONCLUSION

Fake news is being used to spread false information

over social media platforms. Fake news detection is

the substantial area of research in the field of Natural

Language Processing. This paper provided the

comparative analysis of machine learning

approaches for fake news detection. To do the

analysis process, this paper used fake and real news

dataset. This paper provided the comparative

analysis of well-known machine learning approaches

like Naïve Bayes, SVM, and Decision tree classifier,

Random Forest, Multinomial NB and Logistic

Regression. The experimental result showed that

SVM classifier outperforms the other approaches

and achieved accuracy of 94.93%.

REFERENCES

Granik M, Mesyura V. Fake news detection using naive

Bayes classifier. In2017 IEEE First Ukraine

Conference .

Khattar D, Goud JS, Gupta M, Varma V. Mvae:

Multimodal variationalautoencoder for fake news

detection. InThe World Wide Web Conference 2019

May 13 (pp. 2915-2921).

Monti F, Frasca F, Eynard D, Mannion D, Bronstein MM.

Fake news detection on social media using geometric

deep learning. arXiv preprint arXiv:1902.06673. 2019

Feb 10.

Pan, Jeff Z., et al. "Content based fake news detection

using knowledge graphs." International semantic web

conference. Springer, Cham, 2018.

Ravi K, Ravi V. A survey on opinion mining and

sentiment analysis: tasks, approaches and applications.

Knowledge-Based Systems. 2015 Nov 1;89:14-46.

Ruchansky N, Seo S, Liu Y. Csi: A hybrid deep model for

fake news detection. In Proceedings of the 2017 ACM

Fake News Detection using Support Vector Machine



on Conference on Information and Knowledge

Management 2017 Nov 6 (pp. 797-806).

Shu K, Sliva A, Wang S, Tang J, Liu H. Fake news

detection on social media: A data mining perspective.

ACM SIGKDD explorations newsletter. 2017 Sep

1;19(1):22-36.

Shu K, Wang S, Liu H. Beyond news contents: The role of

social context for fake news detection. InProceedings

of the Twelfth ACM International Conference on Web

Search and Data Mining 2019 Jan 30 (pp. 312-320).

Singhania S, Fernandez N, Rao S. 3han: A deep neural

network for fake news detection. InInternational

Conference on Neural Information Processing 2017

Nov 14 (pp. 572-581). Springer, Cham.

Wang Y, Ma F, Jin Z, Yuan Y, Xun G, Jha K, Su L, Gao

J. Eann: Event adversarial neural networks for multi-

modal fake news detection. InProceedings of the 24th

acmsigkdd international conference on knowledge

discovery & data mining 2018 Jul 19 (pp. 849-857).

Zhou X, Zafarani R, Shu K, Liu H. Fake news:

Fundamental theories, detection strategies and

challenges. InProceedings of the twelfth ACM

international conference on web search and data

mining 2019 Jan 30 (pp. 836-837).

ICACSE 2021 - International Conference on Advanced Computing and Software Engineering