Evaluate the Tweet Analysis with Improved Accuracy Using Multi

Channel N-gram Convolutional Neural Network Model over

Naive Bayes Model

Chinthapalli Satya Swaroop Reddy and P. Sriramya

Department of Computer Science Engineering, Saveetha School of Engineering,

Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, 602105, India

Keywords: Deep Learning, Neural Networks, Embedding, Tweets, Disaster Management, Naive Bayes, Novel Multi

Channel N-gram CNN Model, Naive Bayes Model.

Abstract: The purpose of this study is to compare the accuracy of tweet analysis using a novel Multi Channel N-gram

CNN model and Naive Bayes model. Materials and Methods: There are two groups in this study: Naive Bayes

methods and Multi channel N-gram CNN. The sample size for each group is 10, and the study's parameters

include an alpha value of 0.8 and a beta value of 0.2. Taking the G-Power value of 80% into account, the

significance value of the dataset was predicted using SPSS. Results and Discussion: In the examination of

tweets, the Multi Channel N gram CNN Algorithm's accuracy was 97.84%, whereas the Naive Bayes

algorithm's accuracy was 79.69%; this means that the two algorithms are statistically different. Conclusion:

When analyzing tweets, the Multi Channel N gram CNN algorithm performs noticeably better than the Naive

Bayes algorithm.

1 INTRODUCTION

Over the past few generations with the increase in the

usage of internet and mobile phones the usage of

social media has increased to a high level. Any kind

of general information can be shared in social media

and that information can be viewed by people all

around the world. Social media apps like Twitter,

Instagram, Facebook and snapchat became very

popular in the world. People can share their feelings

and opinions on social media on different issues

(Ninan, 2022).Any kind of disaster can occur at any

place at any time. In the time of disaster, social media

is the powerful platform that can spread the news

about disaster all over the world within very less time.

The information shared on social media may be

useful for social welfare organisations, Disaster

Management Teams, self help groups and rescue

organisations as it can alert them to take the safety

measures in advance(“Dormant Disaster Organizing

and the Role of Social Media”, 2019). As there is no

limit in sharing information in social media, there is a

chance that many unwanted tweets will also be

shared. People will be sharing both real news and fake

news on social media. So proper analysis of tweets is

required and it plays an important role in many

activities (“Multimodal Analysis of Disaster Tweets”

n.d., 2021). Social media is capable of spreading

information within a fraction of seconds all over the

world, so many people agree that social media is a

powerful tool that can make people and Disaster

Management Teams aware of all situations that are

happening around the world. (Maulana and Maharani,

2021) (Deena, S. et al., 2022). Different comments

and tweets can have a negative impact on the

information itself, which makes it difficult for many

rescue and emergency responders to develop efficient

knowledge management strategies for a catastrophe

setting that is rapidly shifting (Hadiana and Ningsih,

2021).

Many organisations, Disaster Management

Teams and people found the importance of proper

analysis of tweets and they started working on

generating new models that can analyse the tweets

and can give accurate outcomes. With the increase in

work on analysis of tweets, many articles are also

published on disaster tweet analysis across various

journal databases likeScienceDirect, IEEE, E

Village, etc. Total of 527 articles were published in

ScienceDirect’s database on Disaster tweet analysis

546

Reddy, C. and Sriramaya, P.

Evaluate the Tweet Analysis with Improved Accuracy Using Multi Channel N-gram Convolutional Neural Network Model over Naive Bayes Model.

DOI: 10.5220/0012772600003739

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Artiﬁcial Intelligence for Internet of Things: Accelerating Innovation in Industry and Consumer Electronics (AI4IoT 2023), pages 546-552

ISBN: 978-989-758-661-3

in IEEE database using machine learning algorithms

over the last 5 years and around and 13 Journals were

published in IEEE database. Hien et al compared

learning-based and matching-based techniques for

finding related tweets. and came to the conclusion

that, as compared to the learning-based approach, the

matching-based methodology includes tweets that are

higher in quality but less relevant 2017 (To et al.). J.

Rexiline Ragini collected the data from Twitter about

disaster tweets and the analysis of big data is done

using Apache Spark Big Data Framework and Python

programming language(Sitaula and Shahi, 2022)

(Ramkumar. G et al., 2022). Shamanth Kumar from

Arizona State University introduced a Tweet Tracker

application that constantly monitors Twitter

streaming feed using particular hashtags and

keywords about disaster (Kumar et al., 2011). An

innovative method to visualize the emotional state of

the general public in the event of a natural disaster

was given by (Shekhar and Setty, 2015).

There are a lot of research articles published on

proper analysis of tweets. In all the papers the authors

used different algorithms and models in both Machine

Learning and in Deep Learning, but many of the

results in predicting accuracy were lower than

expected so it became the major drawback in almost

every research paper (Maulana and Maharani, 2021).

So, this research study aims to improve the

accuracy of predicting whether the tweet is real or

fake with the help of Multi Channel N gram CNN

algorithms in Machine Learning with less

computational time.

2 MATERIALS AND METHODS

Saveetha School Of Engineering in Saveetha

University provided its Data Science lab for this

research work for study setting. The main Objective

of this research is to do a comparative analysis on two

groups. First group is Multi Channel N-gram CNN

algorithm and second group is Naive Bayes

algorithm. Same sample size of 10 is taken for each

of the two groups (“Tweet Analysis -

ANN/BERT/CNN/n-Gram CNN”, 2020). For the

purposes of this study, experimental computation is

carried out utilizing G-power as 80% with a 95%

confidence interval, alpha as 0.05, and beta as 0.2.

The data set Sample_Submission.csv for this

comparative study is taken from the open platform

Kaggle.com.

2.1 Multi Channel N-gram CNN Model

A multi-channel CNN model is a convolutional

neural network that takes input from multiple

channels or sources. Each channel represents a

different aspect or feature of the input data.

Multiple convolutional layers are often followed

by pooling layers and fully linked layers in the design

of a multi-channel CNN model. Each channel is fed

into a separate set of convolutional layers, and the

outputs from each set of layers are combined before

being passed to the next set of layers. The advantage

of using a multi-channel CNN model is that it allows

the system to learn different aspects of the input data

separately and then combine them to make a more

accurate prediction. It can also help to reduce

overfitting by providing multiple sources of

information to the network. Overall, a multi-channel

CNN model is a powerful tool for tasks that involve

complex input data with multiple sources of

information. This Multi Channel CNN approach was

first used by Yoon Kim in his paper titled

“Convolutional Neural Networks for Sentence

Classification” (Kim, 2014).

To perform tweet analysis prediction using Multi

Channel CNN, the following steps are involved:

1. Encrypt the data

2. Define Model

3. Fitting data in the Model

4. Predict the outcome of text data.

2.2 Naive Bayes Model

The Naive Bayes model is used mostly for solving

classification problems using a probabilistic

approach. This model is based on the popular

mathematical theory called the Bayes probability

theorem. In the case of Bayes theorem, the occurrence

of one event is always independent of occurring other

events and so it is called naive. The Naive Bayes

algorithm is expected to show optimal prediction

having a high range of applicability compared with

other models. This classifier has various applications

as it is used in many problems like classification

problems, sentimental analysis, fraud detection e.t.c.

(Ji, Yu, and Zhang, 2011). The formula for Bayes

theorem is stated as below:

P(A|B) = P(B|A)*P(A)/P(B)

Where ,

● P(A|B) is Probability of occurrence of event A

after B

● P(B|A) is c after A

● P(A) is Probability of occurrence of event A

● P(B) is Probability of occurrence of event A

Evaluate the Tweet Analysis with Improved Accuracy Using Multi Channel N-gram Convolutional Neural Network Model over Naive

Bayes Model

547

The dataset sample_submission.csv is used in this

study. The sample_submission was divided into two

different parts in the proportion of 80/20. Major part

is used for training purposes and the minor part is

used for the testing process. Names of both datasets

are train.csv and test.csv respectively. By using both

the training and testing datasets the algorithm was

implemented to get the outcome. The laptop with an

Intel i5 processor, 8GB of RAM, 64-bit Windows 11

operating system, and other features is used to

conduct this research.

2.3 Statistical Analysis

The software utilized in this instance is IBM SPSS

V22.0 for statistical implementation. For statistical

computations like mean and standard deviation as

well as to help layout the graphs, we use the Statistical

Package for Social Sciences (SPSS). The

TweetsNumber and DataSize are the independent

variables. 'Accuracy' is the dependent variable. For

each group, a sample size of 10 is used to construct

the dataset, and accuracy is used as the testing

variable. An independent samples t-test is performed

to statistically compare the two methods'

significance.

3 RESULTS

The ultimate goal of this research article is to compare

the accuracy in analysis of tweets using the Multi

Channel N-gram CNN model and Naive Bayes. The

most accurate algorithm is selected between the two

algorithms based on the accuracy of its output. The

accuracy shown by the Multi Channel N-gram CNN

model is 97.84% whereas the accuracy shown by the

Naive Bayes model is 79.69%.

Table 1 represents the sample data set taken for

this research work.

Table 2 represents the Pseudocode for Multi

Channel N-gram CNN model.

Table 3 represents the Pseudocode for the Naive

Bayes model.

Table 1: SampleDataset.

id tex

target

1 Our actions are what caused this earthquake, I want Allah to pardon us all. 1

4 Canadian forest fire near La Ronge, Saskatchewan 1

Officers have requested that all residents "shelter in place." There aren't any further anticipated

evacuation or stay-in-place orders.

6 13,000 residents in California are issued evacuation orders due to wildfires. 1

7 Just received this picture from Ruby, Alaska, showing smoke from wildfires entering a school. 1

California Highway 20 is closed in both directions due to a fire in Lake County (#RockyFire Update) -

#CAfire #wildfires

10 #disaster #flood Flash flooding is caused by heavy rain in the Manitou and Colorado Springs areas. 1

13 The fire in the woods is visible from where I am standing on the hilltop. 1

14 Since the building across the street is currently undergoing an emergency evacuation, 1

15 I'm worried that a tornado will soon hit our neighborhood. 1

Table 2: Pseudocode for Multi Channel N-gram CNN model.

// I: Input dataset records

1. Import the required packages.

2. Convert the string values in the dataset to numerical values.

3. Assign the data to X_train, y_train, X_test and y_test variables.

4. Using train_test_split() function, pass the training and testing variables and give test_size and the random_state as parameters.

5. Import the Multi Channel N-gram CNN model.

6. Using the Multi Channel N-gram CNN model, predict the output of the testing data.

7. Calculate the accuracy

OUTPUT

//Accuracy

AI4IoT 2023 - First International Conference on Artiﬁcial Intelligence for Internet of things (AI4IOT): Accelerating Innovation in Industry

and Consumer Electronics

548

Table 3: Pseudocode for Naive Bayes model.

// I: Input dataset records

1. Import the required packages.

2. Convert the string values in the dataset to numerical values.

3. Assign the data to X_train, y_train, X_test and y_test variables.

4. Using train_test_split() function, pass the training and testing variables and give test_size and the random_state as parameters.

5. Import the Naive Bayes model.

6. Using the Naive Bayes model, predict the output of the testing data.

7. Calculate the accuracy

OUTPUT

//Accuracy

Table 4: Accuracy of Classification of Tweet analysis using Multi Channel N-gram CNN model.

GROUP ACCURACY LOSS

TEST 1 97.83 2.17

TEST 2 97.76 2.24

TEST 3 97.93 2.07

TEST 4 97.88 2.12

TEST 5 97.76 2.24

TEST 6 97.88 2.12

TEST 7 97.79 2.21

TEST 8 97.81 2.19

TEST 9 97.93 2.07

TEST 10 97.84 2.16

Table 5: Accuracy of Classification of Tweet analysis using Naive Bayes model.

GROUP ACCURACY LOSS

TEST 1 79.62 20.38

TEST 2 78.72 21.28

TEST 3 79.56 20.44

TEST 4 79.67 20.33

TEST 5 79.14 20.86

TEST 6 80.30 19.70

TEST 7 80.25 19.75

TEST 8 80.40 19.60

TEST 9 79.25 20.75

TEST 10 80.04 19.96

Table 6: Group Statistics Results represented for Accuracy for Multi Channel N-gram CNN and Naive Bayes algorithms.

Algorithm N Mean Std. Deviation Std. Error Mean

Accuracy N-gram CNN 10 97.8410 0.06297 0.01991

Naive Bayes 10 79.6950 0.55490 0.17548

Evaluate the Tweet Analysis with Improved Accuracy Using Multi Channel N-gram Convolutional Neural Network Model over Naive

Bayes Model

549

Table 7: Independent Samples T-test shows significance value achieved is p=0.000 (p<0.05), which shows that the two groups

are statistically significant.

Levene’s test for equality of

variances

T test for Equality of means

F Sig t df

Sig(2-

tailed)

Mean Difference

Std Error

Differen

95%

confiden

ce level

Lower

95%

confiden

ce level

Upper

Accuracy

Equal

variances

assumed

16.619 0.01 102.751 18 0.000 18.146

0.17660

17.77497

18.51703

Accuracy

Equal

variances

not

assumed

102.751 9.232 0.000 18.146

0.17660

17.74802

18.54398

Figure 1: Bar chart showing the comparison of Multi Channel N gram CNN (97.84%) and Naive Bayes (79.69%) in terms

of mean accuracy. X-Axis: Multi Channel N gram CNN (N gram CNN) VS Naive Bayes and Y-Axis: the Mean accuracy of

detection with ±2 SD.

Table 6 shows the Group statistics results

represented for Accuracy and Loss for Multi Channel

N-gram CNN model and Naive Bayes model. The

mean, standard deviation and standard error mean for

Multi Channel N-gram CNN model is 97.84,0.06297

and 0.01991 respectively. The mean, standard

deviation and standard error mean for Naive Bayes

model is 79.6950,0.55490 and 0.17548 respectively.

So by comparing the results it is very much clear that

the Multi Channel N-gram CNN model is more

accurate than Naive Bayes model in analysing the

tweets.

Table 7 shows the independent sample T test

performed on Multi Channel N-gram CNN model and

Naive Bayes model to calculate the accuracy and loss

in both equal variance assumed and equal variance

not assumed. With the confidence level of 95% it also

gave the values of mean difference and standard error

difference.

Figure 1 shows the comparison of both algorithms

with the help of a bar graph. The bar graph is plotted

in between Multi Channel N-gram CNN model and

Naive Bayes model. Accuracy is taken on the X axis

and Group names are taken on the Y axis in the bar

graph. By observing the bar graph we can understand

that there is a significant difference in between both

the algorithms in terms of accuracy. The accuracy of

the Multi Channel N-gram CNN model is more when

compared to Naive Bayes model.Total of 10

iterations were performed on both the proposed

model and existing model and all the outcomes are

noted in Table 4 and Table 5. An Independent

Sample Test was performed using the SPSS tool.

AI4IoT 2023 - First International Conference on Artiﬁcial Intelligence for Internet of things (AI4IOT): Accelerating Innovation in Industry

and Consumer Electronics

550

4 DISCUSSIONS

By comparing all the outcomes and results it is

observed that the Multi Channel N-gram CNN model

is showing much more accurate results in analysis of

disaster tweets than Multi Channel N-gram CNN

model. The accuracy of the Multi Channel N-gram

CNN model is 97.84%, loss is 2.16% and the

accuracy of Glove with Keras Word embedding

model is 55.06%, loss is 46.94%.

Some of the research articles that are already

published are supportive to our research article.

Author proposed a model using Multi Channel CNN

model for classifying the covid related tweets and got

the accuracy of 94.56%.(Sitaula and Shahi

2022).Author proposed a model for analysing the

disaster related images using the Multi Model

network, VCG-16, ResNet-50 and Xception

Network. And he concluded that analysis of disaster

related images is best done using the Multi Model

network (Asif et al. 2021). Author proposed a model

to analyse the disaster tweets using CNN and ANN

algorithms, and he also concluded that accuracy was

better in both CNN and ANN combined than the

individual algorithms.(Mathur, Sharma, and Veer

2022). Author in his research work used Naive Bayes

algorithm, CNN with Multi channel distribution and

CNN without Multi channel distribution for

classifying the disaster tweets. And he concluded that

analysis of tweets using CNN with a multi-channel

model gave highly accurate results.(Sitaula and Shahi

2022). Limitations of our work is that this method is

feasible on the offline datasets of significant size and

the live updates can not be known using this analysis.

So the study was restricted to the limit of data

availability that might contain only some part of

disaster related tweets. The prediction done by the

algorithm may be much more different than the real

time live prediction. Future scope of this study is I

intend to extend our database to the other networking

apps like facebook, instagram etc. I also intend to add

disaster prediction models to work to know the trends

of disasters in various regions. By further developing

the work, it might be very useful to various disaster

management teams and organisations.

5 CONCLUSION

In this research work, the results show us that Multi

Channel N-gram CNN model can be used in the

analysis of disaster tweets with improved accuracy of

97.84% than the Glove with Keras Word embedding

model with accuracy of 55.06%.

REFERENCES

Asif, Amna, Shaheen Khatoon, 2021. Md Maruf Hasan,

Majed A. Alshamari, Sherif Abdou, Khaled Mostafa

Elsayed, and Mohsen Rashwan. “Automatic Analysis

of Social Media Images to Identify Disaster Type and

Infer Appropriate Emergency Response.” Journal of

Big Data 8 (1): 1–28.

Deena, S. R., Kumar, G., Vickram, A. S. Singhania, R. R.,

Dong, C. D., Rohini, K., ... & Ponnusamy, V. K., 2022.

Efficiency of various biofilm carriers and microbial

interactions with substrate in moving bed-biofilm

reactor for environmental wastewater treatment.

Bioresource technology, 359, 127421.

“Dormant Disaster Organizing and the Role of Social

Media.”, 2019. New Media in Times of Crisis.

https://doi.org/10.4324/9780203703632-14.

Ji, Yaguang, Songnian Yu, and Yafeng Zhang, 2011. “A

Novel Naive Bayes Model: Packaged Hidden Naive

Bayes.” 2011 6th IEEE Joint International Information

Technology and Artificial Intelligence Conference.

https://doi.org/10.1109/itaic.2011.6030379.

Kim, Yoon, 2014. “Convolutional Neural Networks for

Sentence Classification,” August. https://doi.org/10.

48550/arXiv.1408.5882.

Kumar, 2011, Shamanth, Geoffrey Barbier, Mohammad

Abbasi, and Huan Liu. “TweetTracker: An Analysis

Tool for Humanitarian and Disaster Relief.”

Proceedings of the International AAAI Conference on

Web and Social Media 5 (1): 661–62.

A. Kishore Kumar, 2022. M. Aeri, A. Grover, J. Agarwal,

P. Kumar, and T. Raghu, “Secured supply chain

management system for fisheries through IoT,” Meas.

Sensors, vol. 25, no. August, p. 100632, 2023, doi:

10.1016/j.measen.2022.100632.

Mathur, Prerak, Tanu Sharma, and Karan Veer, 2022.

“Analysis of CNN and Feed Forward ANN Model for

the Evaluation of ECG Signal.” Current Signal

Transduction Therapy. https://doi.org/10.2174/15743

62417666220328144453

Maulana, Iqbal, and Warih Maharani, (2021. “Disaster

Tweet Classification Based On Geospatial Data Using

the BERT-MLP Method.” 2021 9th International

Conference on Information and Communication

Technology (ICoICT). https://doi.org/10.1109/icoict5

2021.2021.9527513.

“Multimodal Analysis of Disaster Tweets.” n.d. Accessed

December 19, 2022. https://ieeexplore.ieee.org/

document/8919468.

Ninan, Johan., 2022. “The Past, Present and Future of

Social Media in Project Management.” Social Media

for Project Management. https://doi.org/10.12

01/9781003215080-1.

Ningsih, A. K., and A. I. Hadiana, 2021. “Disaster Tweets

Classification in Disaster Response Using Bidirectional

Evaluate the Tweet Analysis with Improved Accuracy Using Multi Channel N-gram Convolutional Neural Network Model over Naive

Bayes Model

551

Encoder Representations from Transformer (BERT).”

IOP Conference Series: Materials Science and

Engineering 1115 (1): 012032.

Ramkumar, G. et al., 2021. “An Unconventional Approach

for Analyzing the Mechanical Properties of Natural

Fiber Composite Using Convolutional Neural

Network” Advances in Materials Science and

Engineering vol. 2021, Article ID 5450935, 15 pages,

2021. https://doi.org/10.1155/2021/5450935

Shekhar, Himanshu, and Shankar Setty, 2015. “Disaster

Analysis through Tweets.” In 2015 International

Conference on Advances in Computing,

Communications and Informatics (ICACCI). IEEE.

https://doi.org/10.1109/icacci.2015.7275861.

Sitaula, Chiranjibi, and Tej Bahadur Shahi, 2022. “Multi-

Channel CNN to Classify Nepali Covid-19 Related

Tweets Using Hybrid Features,” March.

https://doi.org/10.48550/arXiv.2203.10286.

To, Hien, 2017, Sumeet Agrawal, Seon Ho Kim, and Cyrus

Shahabi. 2017. “On Identifying Disaster-Related

Tweets: Matching-Based or Learning-Based?” In 2017

IEEE Third International Conference on Multimedia

Big Data (BigMM). IEEE. https://doi.org/10.110

9/bigmm.2017.82.

“Tweet Analysis - ANN/BERT/CNN/n-Gram CNN”, 2020.

Kaggle. July 19, 2020. https://kaggle.com/code

/jagdmir/tweet-analysis-ann-bert-cnn-n-gram-cnn.

V. P. Parandhaman, 2023. "An Automated Efficient and

Robust Scheme in Payment Protocol Using the Internet

of Things," Eighth International Conference on Science

Technology Engineering and Mathematics

(ICONSTEM), Chennai, India, 2023, pp. 1-5, doi:

10.1109/ICONSTEM56934.2023.10142797.

AI4IoT 2023 - First International Conference on Artiﬁcial Intelligence for Internet of things (AI4IOT): Accelerating Innovation in Industry

and Consumer Electronics

552