A Comparative Study of Evolutionary Methods for Feature Selection in

Sentiment Analysis

Shikhar Garg and Sukriti Verma

Adobe Systems, Noida, Uttar Pradesh, India

Keywords:

Meta-heuristic, Feature Selection, Evolutionary Algorithm, Binary Bat, Binary Grey Wolf, Genetic Algorithm.

Abstract:

With the recent surge of social media and other forums, availability of a large volume of data has rendered

sentiment analysis an important area of research. Though current state-of-the-art systems have been demon-

strated impressive performance, there is still no consensus on the optimum feature selection algorithm for the

task of sentiment analysis. Feature selection is an indispensable part of the pipeline in natural language models

as the data in this domain has extremely high dimensionality. In this work, we investigate the performance of

two meta-heuristic feature selection algorithms namely Binary Bat and Binary Grey Wolf. We compare the

results obtained to employing Genetic Algorithm for the same task. We report the results of our experiments

on publicly available datasets drawn from two different domains, viz. tweets and movie reviews. We have

used SVM, k-NN and Random Forest as the classiﬁcation algorithms.

1 INTRODUCTION

Social networking platforms such as blogs, sites and

other forums are being increasingly used by people

to express sentiments, opinions and emotions in the

form of reviews, comments, posts, tweets etc. How-

ever, because of the sheer volume of data that is

available, understanding and distilling the informa-

tion contained in this data in a meaningful manner

remains a formidable task. Sentiment analysis, also

known as opinion mining, is a ﬁeld within natural lan-

guage processing that concerns itself with this task of

building systems to collect, examine and understand

the sentiments and opinions expressed by people. It

can be modelled as a form of classiﬁcation of data

with respect to sentiments: positive, negative or neu-

tral. Sentiment analysis has various practical applica-

tions to businesses, marketers, advertisers etc. (Med-

hat et al., 2014)

There are several challenges in building sentiment

analysis systems to analyze textual data in English

language. Most of these issues stem from the pecu-

liarities and ambiguities present in the language and

the subtle differences in expressions that make it dif-

ﬁcult to assess sentiments accurately (Vinodhini and

Chandrasekaran, 2012). One major issue with text

documents is the extremely high dimensionality of

text data. Moreover, it is difﬁcult to deﬁne the fea-

tures of a text document as these are usually hidden in

a large pool of subjective text (Eirinaki et al., 2012).

Commonly used features are single word, character

N-grams, word N-grams etc. In this work, we use TF-

IDF feature representation and aim to leverage feature

selection to improve and enhance sentiment analysis

for text data in English (Forman, 2003).

Feature selection is a dimensionality reduction

technique to remove irrelevant, redundant and noisy

features from the dataset. It is an important step

speciﬁcally for processing text data due to its dimen-

sionality (Karabulut et al., 2012b). Feature selection

often leads to a lower computation cost, improved

learning accuracy and enhances model interpretabil-

ity (Miao and Niu, 2016). The process of feature se-

lection can be either independent or dependent on the

classiﬁer. The former type of methods are more time

efﬁcient in comparison but the latter type of meth-

ods are more reliable because while selecting features

these methods consider the effect that a feature may

have towards a speciﬁc classiﬁer. Hence, this work

focuses on methods of feature selection that are de-

pendent on the classiﬁer. However, the selection of

an optimal subset of features with respect to classiﬁer

accuracy is an NP-hard problem (Novakovic, 2010).

Some heuristic algorithms such as best ﬁrst search,

random search and evolutionary algorithms are often

used (Ahmad et al., 2015). In this work, we em-

ploy two meta-heuristic algorithms, namely Binary

Bat (Nakamura et al., 2012) and Binary Grey Wolf

Garg, S. and Verma, S.

A Comparative Study of Evolutionary Methods for Feature Selection in Sentiment Analysis.

DOI: 10.5220/0007948201310138

In Proceedings of the 11th International Joint Conference on Computational Intelligence (IJCCI 2019), pages 131-138

ISBN: 978-989-758-384-1

131

(Emary et al., 2016), for the task of feature selection.

Research has shown that these algorithms tend to out-

perform various conventional algorithms (Fong et al.,

2013). We investigate these methods for feature se-

lection and compare them to basic evolutionary meth-

ods. We carry out this study on two publicly available

datasets.

The main contributions of this work over existing

literature are as follows:

1. We compare the performance of meta-heuristic al-

gorithms with basic evolutionary algorithms.

2. We also compare two closely linked meta-

heuristic evolutionary algorithms hence providing

some insight into the subtle differences that exist

between the two.

2 RELATED WORK

Recent boom in the usage of social networking sites

and the availability of a large amount of data has

deemed sentiment analysis an important area of in-

vestigation. There are multiple granularities at which

sentiment based classiﬁcation of text may be per-

formed: the document level, sentence level, or at-

tribute level (Vinodhini and Chandrasekaran, 2012).

In this work, we focus on document level senti-

ment classiﬁcation of tweets drawn from the Senti140

dataset (Go et al., 2009) and movie reviews drawn

from Cornell Movie Reviews dataset (Pang and Lee,

2004). The main approaches undertaken to perform

sentiment analysis in exiting literature can be catego-

rized into 2 categories:

1. Lexicon based Approach: This type of approach

uses out-of-context scoring for individual features

followed by combining these individual scores

into a score for the entire document.

2. Machine Learning based Approach: This type

of approach uses predictive models to learn the

mapping from document feature values to the sen-

timent of the document.

Machine learning based approaches are much

more accurate than lexicon based approaches as the

latter do not include any contextual information while

deciding the sentiment of a given document. Work

by Pang et al. (Pang et al., 2002) ﬁrst modelled sen-

timent classiﬁcation as a special case of topic based

document classiﬁcation and applied supervised ma-

chine learning models to perform the same. They

demonstrated Naive Bayes (Tan et al., 2009), Maxi-

mum Entropy, and Support Vector Machines (Mullen

and Collier, 2004) to perform better than any other

approach at the time. The main step before training

any machine learning model is to engineer a set of

features that are given as input. As discussed before,

one of the challenging issues in text classiﬁcation is

presence of a large number of features. As each fea-

ture can affect the classiﬁcation accuracy positively

or negatively, feature selection methods are required

to retain only the most relevant and informative fea-

tures (Forman, 2003).

Feature selection methods may be divided into

three categories (Ahmad et al., 2015):

1. Filtration: This type of feature selection is in-

dependent of the machine learning algorithm. It

aims to assign an importance score to each feature

using statistical analysis.

2. Wrapping: This type of approach is dependent

on the machine learning algorithm. Due to this

dependence, it requires extensive computation but

is a more reliable method than ﬁltration because it

considers the effect that a feature may have on a

given classiﬁer.

3. Hybrid: This type of approach overcomes the

weaknesses of both ﬁlters and wrappers by using a

combination of the two to limit computation while

interacting with the classiﬁer.

Conventional feature selection methods have been

widely studied for some time in the research commu-

nity and their effects on various types of datasets have

also been well explored (Karabulut et al., 2012a).

Most conventional feature selection methods were of

the ﬁltration category. Principal Component Analysis

is one such feature selection algorithm which exploits

the concept of co-variance matrix and eigen-vectors

to determine the most important features (Song et al.,

2010). Study has shown that when this algorithm is

applied on face recognition task it reduces the dimen-

sionality signiﬁcantly and accuracy is also not ad-

versely affected (Song et al., 2010). Another con-

ventional feature selection algorithm, Chi-square, has

been found to be successful on the task of Arabic text

categorization and yielding an F measure of 88.11

(Mesleh, 2007). Other methods that have been suc-

cessfully applied to sentiment analysis for English

text are Document frequency, Information Gain, Gain

Ratio, Correlation-based Feature Selection, Mutual

Information, Fisher Score, Relief-F etc. (Sharma and

Dey, 2012) (Forman, 2003). These methods do not

interact with the classiﬁer.

However, in the last decade, the trend has shifted

to wrapper category of methods that have interac-

tion with the classiﬁer. To reduce computation within

the wrapper based feature selection methods, meta-

heuristic search can replace brute-force search. Meta-

ECTA 2019 - 11th International Conference on Evolutionary Computation Theory and Applications

132

heuristic algorithms have their basis in the behavior of

biological systems present in nature and are capable

of searching large state spaces. These algorithms have

been successfully applied to problems that require

combinatorial exploration (Ahmad et al., 2015). Evo-

lutionary meta-heuristic algorithms like genetic algo-

rithm have shown reasonably positive results when

employed to the task of high-dimensionality fea-

ture selection by searching for optimal feature subset

stochastically (Zhu et al., 2010). The two main com-

ponents of these algorithms are diversiﬁcation and in-

tensiﬁcation (Gandomi et al., 2013). While intensi-

ﬁcation causes successful exploitation of the locally

best solutions, diversiﬁcation ensures that the algo-

rithm does an efﬁcient global search and explores the

entire state space to a sufﬁcient measure. These two

processes ensure that the search space is searched

thoroughly. The primary objectives behind the de-

velopment of these algorithms are efﬁciently solving

search problems with a large state space and obtain-

ing methods that are more robust than the current ones

(Talbi, 2009).

Feature selecting using Genetic Algorithm in

combination with Information Gain (Abbasi et al.,

2008), has been demonstrated to achieve very high

accuracies of over 93% for multiple languages. Ant

Colony optimisation, a nature inspired meta-heuristic

swarm intelligence algorithm (Fong et al., 2013),

has been demonstrated to perform well for high-

dimensionality feature selection. It has been shown

to outperform Genetic Algorithm, Information Gain

and Chi-square method on the task of feature selec-

tion on the Reuters-21578 dataset (Aghdam et al.,

2009). Other more sophisticated meta-heuristic algo-

rithms yield an improved accuracy even in combina-

tion with very simple classiﬁers such as optimum path

forest (Rodrigues et al., 2014).

In this publication, we investigate the performance

of two meta-heuristic algorithms: Binary Bat (Naka-

mura et al., 2012) and Binary Grey Wolf (Emary et al.,

2016) for the task of feature selection. The classiﬁ-

cation accuracy of these two feature selection meth-

ods is explored using three machine learning classi-

ﬁers: Random Forest, Support Vector Machine and

k-Nearest Neighbor. The remainder of the paper is

organized as follows. Section 3 outlines our approach

in detail. Section 4 reports performance of these two

meta-heuristic feature selection algorithms when used

with 3 classiﬁers on two publicly available datasets.

Section 4 also compares the proposed technique with

basic evolutionary techniques for feature selection.

Sections 5 concludes this publication.

3 APPROACH

In this section, we provide a detailed description of

our proposed approach. It can be broken down into 3

major phases:

1. Pre-processing and Feature Extraction: This

phase comprises of the steps required to extract

relevant features from a given text document.

2. Feature Selection: This phase includes applica-

tion of meta-heuristic algorithms to reduce dimen-

sionality of the feature space.

3. Classiﬁcation: This is where we train a machine

learning classiﬁer to predict the sentiment of a

given text document.

3.1 Preprocessing and Feature

Extraction

Preprocessing is crucial when it comes to processing

text. In this phase, we do the following:

1. Paragraph Segmentation: The paragraphs in a

document are broken down into sentences.

2. Word Normalization: Each sentence is broken

down into words and the words are normalized.

Normalization involves lemmatization and results

in all words being in one common verb form,

crudely stemmed down to their roots with all

ambiguities removed. For this purpose, we use

Porters algorithm.

3. Stop Word Filtering: Each token is analyzed to

remove high frequency stop words.

The next step is to map given input to some set

of features that distill crucial information present in

the given document. There is still no consensus on

what type of features would serve as the best repre-

sentation of a text document for the task of classiﬁ-

cation. The feature representations proposed in the

literature belong to 3 types: semantic, stylistic, and

syntactic (Abbasi et al., 2008). Different studies have

tried to resolve this issue and compared existing fea-

ture selection methods (Pang et al., 2002). Most of

the existing research uses simple features like single

words, character N-grams, word N-grams or some

combination of these (Abbasi et al., 2008). In this

work, we use TF-IDF feature representation (Salton

and Buckley, 1988). The use of TF-IDF to represent

textual features has been thoroughly justiﬁed in liter-

ature (Hiemstra, 2000).

A Comparative Study of Evolutionary Methods for Feature Selection in Sentiment Analysis

133

3.2 Feature Selection

In this section, we brieﬂy outline the two algorithms

that have been adapted for feature selection in this

work.

3.2.1 Binary Bat Algorithm

The advanced capability of echolocation in bats has

been used to develop a meta-heuristic optimization

technique called the Bat Algorithm (Yang, 2012).

Echolocation works as a type of sonar: bats, emit

a loud and short pulse of sound, wait till the sound

hits an object and the echo returns back to their ears

(Grifﬁn et al., 1960). The time it takes for the echo to

reach back lets bats compute how far they are from the

said object (Metzner, 1991). Furthermore, bats have

a mechanism to distinguish the difference between an

obstacle and a prey (Schnitzler and Kalko, 2001).

The Bat Algorithm encodes the behavior of a band

of bats tracking prey using echolocation. In order

to model this algorithm, (Yang, 2012) has idealized

some rules, as follows:

1. All bats use echolocation to sense distance, and

they can also differentiate between food, prey and

obstacles.

2. A bat b

ﬂies with velocity v

at position x

with

a ﬁxed frequency F

min

, varying wavelength λ

and loudness A

to search for prey. They auto-

matically adjust the wavelength of their emitted

pulses. They also adjust the rate of pulse emission

r ∈ [0, 1], depending on the proximity to their tar-

get.

3. Although the loudness can vary in many ways,

Yang (Yang, 2012) assumes that the loudness

varies from a large positive value, A

to a mini-

mum constant value, A

min

Each bat b

is assigned a random initial position

, velocity v

and frequency f

. At each time step t,

updates to the state of each bat are made using the

following equations:

= f

min

+ ( f

min

− f

max

)β (1)

(t) =

(t − 1) + [

X −

(t − 1)] f

(2)

(t) =

(t − 1) +

(t) (3)

Here, β denotes a randomly generated number

drawn from [0,1]. Frequency f

is used to control the

pace of the movement.

X denotes the current global

best solution. After each iteration, the loudness A

and

the pulse rate r

are updated as follows:

(t + 1) = αA

(t) (4)

Table 1: Hyperparameters for the Binary Bat Algorithm.

Hyperparameter Value

Number of Bats 70

Number of Iterations 100

Alpha, α 0.90

Gamma, γ 0.90

min

0.00

max

1.00

(0) 1.5

(0) 0.5

(t + 1) = r

(0)[1 − exp(−γt)] (5)

Here, α and γ are hyperparameters. The suggested

values for A

(0) ∈ [1, 2] and r

(0) ∈ [0, 1] (Yang,

2012). Yang (Yang, 2012) also suggests the concept

of random walks to introduce and improve divesity in

the current solution set. With feature selection, we are

working in a discrete space where we want to either

keep a feature or discard it. To this end, Nakamura et

al. (Nakamura et al., 2012) have developed a binary

version of the Bat Algorithm. They apply the sigmoid

function to limit a bat’s position to binary values. The

hyperparameters used in this work are listed in Table

1. These hyperparameters were tuned experimentally.

3.2.2 Binary Grey Wolf Optimisation Algorithm

Grey Wolfs have a very interesting social behaviour.

They often live in a pack and follow a very rigid social

hierarchy of dominance. At the top of hierarchy are

their leaders which are known as the alphas. They

dictate orders which must be followed by the group.

Just below the alphas are the betas. They are strong

wolfs which ensures that alpha’s order is followed and

are in next line to become the alpha. The subset of

wolves which are dominated by all the other wolves

are known as omegas. The remaining wolves which

are not alpha, beta or omega are deltas.

The social behaviour of interest to this algorithm

is the hunting behaviour of the wolves. The key stages

of hunting are as follows (Muro et al., 2011):

1. Approaching the prey

2. Encircling the prey

3. Attacking the prey

Mathematical modelling of social hierarchy sets

the ﬁttest candidate solution as the alpha α, and subse-

quently the second ﬁttest solution as beta β, the third

ﬁttest candidate solution as delta δ. The rest of the so-

lutions in the search space are set as omegas ω. Hunt-

ing is guided by the positions of α, β, δ.

1. Approaching the prey is mathematically modelled

by the change in the position vector of the grey

ECTA 2019 - 11th International Conference on Evolutionary Computation Theory and Applications

134

wolf. This position vector is denoted by

X(t),

where t is the iteration number.

2. Encircling of the prey is modelled using the fol-

lowing equations (Mirjalili et al., 2014):

D =



(t)−

X (t)



(6)

X (t + 1) =

(t) −

D (7)

A = 2~a.~r

−~a (8)

C = 2~r

(9)

where

D denotes the distance between the wolf

and the prey,

A and

C are coefﬁcients vectors that

control the exploitation and exploration phases of

the grey wold optimization algorithm by using

random vectors ~r

and ~r

and~a, which linearly de-

creases form 2 to 0 over the course of iterations.

3. Hunting behaviour is modelled by the following

equations (Mirjalili et al., 2014):-



(t) −

X (t)



(10)



(t) −

X (t)



(11)



(t) −

X (t)



(12)

−

(13)

−

(14)

−

(15)

X (t + 1) =

(16)

The search process is initiated with the creation of

a random population. At every iteration, α, β, and δ

are responsible to estimate the likely position of the

prey. The position of every candidate solution is up-

dated according to the estimated position of the prey

using equation 16. As equation 16 is dependent on

the positions of the α, β, and δ, this algorithm math-

ematically models the fact that hunting is guided by

these three wolves.

As discussed above, the problem of feature selec-

tion is modelled in a discrete space to either select a

given feature or discard it, and hence (Emary et al.,

2016) have come up with a binary version of this al-

gorithm constraining the position of the wolves only

to binary values.

t+1

= Crossover(x

) (17)

Equation 17 represents the crossover between x,

y, z and x

, x

. x

, x

and x

are binary vectors

depicting the move of ω towards α, β, δ. Calculations

Table 2: Hyperparameters for the Binary Grey Wolf Opti-

misation Algorithm.

Hyperparameter Value

Number of Wolves 70

Number of Iterations 100

of x

, x

and x

are shown below in equations 18, 21

and 24.



1 (x

+ bstep

) ≥ 1

0 otherwise

(18)

bstep



1 cstep

≥ rand

0 otherwise

(19)

cstep

1 + e

−10(A

−0.5)

(20)



1 (x

+ bstep

) ≥ 1

0 otherwise

(21)

bstep



1 cstep

≥ rand

0 otherwise

(22)

cstep

1 + e

−10(A

−0.5)

(23)



1 (x

+ bstep

) ≥ 1

0 otherwise

(24)

bstep



1 cstep

≥ rand

0 otherwise

(25)

cstep

1 + e

−10(A

−0.5)

(26)

where x

, x

and x

represents the position vector

of the α, β and δ wolf respectively. bstep

, bstep

and bstep

are the binary step of the α, β and δ

wolf respectively in dimension d. cstep

, cstep

and

cstep

are the continuous step of the α, β and δ wolf

respectively in dimension d and calculated using the

sigmoid function. Finally, rand is a random number

drawn from uniform distribution ∈ [0,1]. The hyper-

parameters used in this work are listed in Table 2.

3.3 Classiﬁcation

The ﬁnal component of our approach is the classiﬁer.

The classiﬁer interacts with the feature selection al-

gorithm, guiding it towards maximal accuracy. The

testing accuracy of the classiﬁer is used as the ﬁtness

function of an entity within our wrapper based fea-

ture selection approach. The performance of this ap-

proach is investigated using three machine learning

classiﬁers: Random Forest, Support Vector Machine

and k-Nearest Neighbor. In the next section, we doc-

ument the results.

A Comparative Study of Evolutionary Methods for Feature Selection in Sentiment Analysis

135

Table 3: Hyperparameters for the Genetic Algorithm.

Hyperparameter Value

Number of Individuals 70

Number of Generations 100

Mutation Probability 0.25

Crossover Probability 0.5

4 RESULTS AND EXPERIMENTS

4.1 Experimental Setup

These comparative studies are done on the Senti140

(Go et al., 2009) and Cornell Movie Reviews (Pang

and Lee, 2004) datasets. The datasets were parti-

tioned into three sets: a training set, a scoring set and

a held-out validation set. The scoring accuracies were

used to guide the ﬁtness function of the feature selec-

tion algorithms. The ﬁnal classiﬁcation accuracy was

reported on a held-out validation set that the classiﬁer

has not seen during the phase of feature selection or

training.

1. Senti140 Dataset: This dataset contains 160,000

tweets that have been pre-processed to remove

emoticons. The dataset has two target labels: 0 for

negative and 4 for positive. We have bootstrapped

and used 40k tweets divided equally among pos-

itive and negative target labels. After preprocess-

ing, the number of features found in this dataset

were 474056.

2. Cornell Movie Reviews: This dataset contains

5331 positive and negative movie reviews. In

this study, we have used 5000 positive reviews

and 5000 negative reviews for the task of train-

ing and testing the approach. After preprocessing,

the number of features found in this dataset were

188440.

For each of the 3 classiﬁers, we run experiments

with the 2 feature selection techniques discussed

above. We compare the performance of using these 2

meta-heuristic algorithms for feature selection to us-

ing Genetic Algorithm for the same task (Ghosh et al.,

2010). We employ one-point crossover for mating

individuals and the old generation is completely re-

placed by the new generation. The individuals chosen

for mating are selected by using a k-way tournament

using k = 3. Other hyperparameters used for the Ge-

netic Algorithm are listed in Table 3. For the results to

be comparable and take up similar running time, the

number of entities in one iteration/generation and the

number of iterations/generations have been kept the

same for all three feature selection methods.

Table 4: Values with SVM as classiﬁer.

Dataset Approach Accuracy Feature

Reduction

Senti140

No FS 76.71% -

GA 80.11% 50%

Binary Bat 77.35% 39%

Binary Wolf 78.52% 17%

Cornell

No FS 75.02% -

Movie

GA 80.92% 49.8%

Reviews

Binary Bat 77.45% 43%

Binary Wolf 77.46% 19.5%

Table 5: Values with RF as classiﬁer.

Dataset Approach Accuracy Feature

Reduction

Senti140

No FS 72.29% -

GA 73.81% 50.14%

Binary Bat 72.97% 42.8%

Binary Wolf 72.95% 22.5%

Cornell

No FS 69.92% -

Movie

GA 70.72% 49.9%

Reviews

Binary Bat 70.28% 33.2%

Binary Wolf 70.17% 15.3%

4.2 Results

In this section, we report the performance of Binary

Bat, Binary Grey Wolf and Genetic Algorithm for the

task of feature selection. We also report classiﬁer

accuracy without any feature selection method as a

baseline. Table 4 reports the values when using SVM

as the classiﬁer. Table 5 reports the same for Random

Forest and Table 6 reports the values for k-NN.

We can note all three methods of feature selection

to consistently lead to an increased accuracy over the

baseline when using SVM as the classiﬁer. While the

performance of the three methods among themselves

are more or less, at par, Genetic Algorithm obtained

the highest accuracies with gain as much as 5%. The

difference in the number of features discarded is sig-

niﬁcant, with Genetic Algorithm achieving well over

2 times the reduction achieved by Binary Grey Wolf

Algorithm. While Binary Bat Algorithm has behav-

ior similar to swarm optimisation, Binary Grey Wolf

Algorithm is guided by the 3 leading wolves and does

not have an explicit diversiﬁcation/intensiﬁcation pro-

cess going on. Hence, it is somewhat more vulnerable

to get stuck in a local minima than the other methods

and the average outcome may come to depend on the

random initialisation.

When using RF as the classiﬁer, the accuracy gain

is not signiﬁcant. This reasons conveniently from the

fact that Information Gain is used as an implicit fea-

ture selector within each tree of the random forest

ECTA 2019 - 11th International Conference on Evolutionary Computation Theory and Applications

136

Table 6: Values with k-NN as classiﬁer.

Dataset Approach Accuracy Feature

Reduction

Senti140

No FS 50.07% -

GA 63.00% 49.97%

Binary Bat 59.69% 50.02%

Binary Wolf 56.36 11.67%

Cornell

No FS 58.28% -

Movie

GA 55.96 % 50.23%

Reviews

Binary Bat 55.20% 0.2%

Binary Wolf 53.98% 8.4%

with max-depth being used to limit the number of fea-

tures used. However, the computation of Information

Gain that has to be done for every feature would sig-

niﬁcantly speed up with a lesser number of features.

Genetic Algorithm again turns out to be the most efﬁ-

cient in terms of feature reduction.

When using k-NN as the classiﬁer, the results

seem to be mixed. While more than signiﬁcant ac-

curacy gains over the baseline have been obtained on

the Senti140 Dataset, we also observe worsened per-

formance over the baseline for the Cornell Movie Re-

views Dataset. This is probably because of the ran-

dom nature of k-NN as it simply performs a majority

voting within k-nearest neighbors and does not actu-

ally pick up any patterns. This shows that for some

classiﬁers, any sort of feature selection will not guar-

antee an increase in the accuracy.

5 CONCLUSION

In this paper, we have compared the performances

of meta-heuristic and evolutionary feature selection

methods to the problem of sentiment analysis using

various classiﬁers on two different domains of tweets

and movie reviews. While we can see, that meth-

ods such as Random Forest that have in-built param-

eters to limit the features used, do not gain any suf-

ﬁcient improvement in accuracy, other methods such

as SVM and k-NN can have gain in accuracy upto

25%. While the performance of Binary Bat and Ge-

netic Algorithm was similar in terms of accuracy gain,

the performance of Binary Grey Wolf Algorithm was

consistently lower than these two. Also, the percent-

age decrease in the number of features is another im-

portant ground to consider while making a choice.

Genetic Algorithm was observed to be the most efﬁ-

cient in terms of feature reduction percentage. More-

over, there is a difference in the number of hyperpa-

rameters that need to be tuned to make each algorithm

work optimally, with Binary Grey Wolf Algorithm be-

ing the easiest to tune. Hence, a multitude of factors

need to be considered when selecting a method for

feature selection. The results reported in this paper

can be used as a guidance for extended work in dif-

ferent domains.

REFERENCES

Abbasi, A., Chen, H., and Salem, A. (2008). Sentiment

analysis in multiple languages: Feature selection for

opinion classiﬁcation in web forums. ACM Transac-

tions on Information Systems (TOIS), 26(3):12.

Aghdam, M. H., Ghasem-Aghaee, N., and Basiri, M. E.

(2009). Text feature selection using ant colony

optimization. Expert systems with applications,

36(3):6843–6853.

Ahmad, S. R., Bakar, A. A., and Yaakub, M. R. (2015).

Metaheuristic algorithms for feature selection in senti-

ment analysis. In 2015 Science and Information Con-

ference (SAI), pages 222–226. IEEE.

Eirinaki, M., Pisal, S., and Singh, J. (2012). Feature-based

opinion mining and ranking. Journal of Computer and

System Sciences, 78(4):1175–1184.

Emary, E., Zawbaa, H. M., and Hassanien, A. E. (2016).

Binary grey wolf optimization approaches for feature

selection. Neurocomputing, 172:371–381.

Fong, S., Yang, X.-S., and Deb, S. (2013). Swarm search for

feature selection in classiﬁcation. In 2013 IEEE 16th

International Conference on Computational Science

and Engineering, pages 902–909. IEEE.

Forman, G. (2003). An extensive empirical study of fea-

ture selection metrics for text classiﬁcation. Journal

of machine learning research, 3(Mar):1289–1305.

Gandomi, A. H., Yang, X.-S., and Alavi, A. H. (2013).

Cuckoo search algorithm: a metaheuristic approach

to solve structural optimization problems. Engineer-

ing with computers, 29(1):17–35.

Ghosh, S., Biswas, S., Sarkar, D., and Sarkar, P. P. (2010).

Mining frequent itemsets using genetic algorithm.

arXiv preprint arXiv:1011.0328.

Go, A., Bhayani, R., and Huang, L. (2009). Twitter senti-

ment classiﬁcation using distant supervision. CS224N

Project Report, Stanford, 1(12).

Grifﬁn, D. R., Webster, F. A., and Michael, C. R. (1960).

The echolocation of ﬂying insects by bats. Animal

behaviour, 8(3-4):141–154.

Hiemstra, D. (2000). A probabilistic justiﬁcation for using

tf× idf term weighting in information retrieval. Inter-

national Journal on Digital Libraries, 3(2):131–139.

Karabulut, E.,

Ozel, S., and Ibrikci, T. (2012a). Compara-

tive study on the effect of feature selection on classiﬁ-

cation accuracy. Procedia Technology, 1:323 –327.

Karabulut, E. M.,

Ozel, S. A., and Ibrikci, T. (2012b). A

comparative study on the effect of feature selection on

classiﬁcation accuracy. Procedia Technology, 1:323–

327.

Medhat, W., Hassan, A., and Korashy, H. (2014). Sentiment

analysis algorithms and applications: A survey. Ain

Shams engineering journal, 5(4):1093–1113.

A Comparative Study of Evolutionary Methods for Feature Selection in Sentiment Analysis

137

Mesleh, A. (2007). Chi square feature extraction based

svms arabic language text categorization system.

Journal of Computer Science, 3.

Metzner, W. (1991). Echolocation behaviour in bats. Sci-

ence Progress (1933-), pages 453–465.

Miao, J. and Niu, L. (2016). A survey on feature selection.

Procedia Computer Science, 91:919–926.

Mirjalili, S., Mirjalili, S. M., and Lewis, A. (2014). Grey

wolf optimizer. Advances in engineering software,

69:46–61.

Mullen, T. and Collier, N. (2004). Sentiment analysis us-

ing support vector machines with diverse information

sources. In Proceedings of the 2004 conference on

empirical methods in natural language processing.

Muro, C., Escobedo, R., Spector, L., and Coppinger, R.

(2011). Wolf-pack (canis lupus) hunting strategies

emerge from simple rules in computational simula-

tions. Behavioural processes, 88(3):192–197.

Nakamura, R. Y., Pereira, L. A., Costa, K. A., Rodrigues,

D., Papa, J. P., and Yang, X.-S. (2012). Bba: a binary

bat algorithm for feature selection. In 2012 25th SIB-

GRAPI conference on graphics, patterns and images,

pages 291–297. IEEE.

Novakovic, J. (2010). The impact of feature selection on the

accuracy of na

ıve bayes classiﬁer. In 18th Telecom-

munications forum TELFOR, volume 2, pages 1113–

1116.

Pang, B. and Lee, L. (2004). A sentimental education:

Sentiment analysis using subjectivity summarization

based on minimum cuts. In Proceedings of the ACL.

Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs

up?: sentiment classiﬁcation using machine learn-

ing techniques. In Proceedings of the ACL-02 con-

ference on Empirical methods in natural language

processing-Volume 10, pages 79–86. Association for

Computational Linguistics.

Rodrigues, D., Pereira, L. A., Nakamura, R. Y., Costa,

K. A., Yang, X.-S., Souza, A. N., and Papa, J. P.

(2014). A wrapper approach for feature selection

based on bat algorithm and optimum-path forest. Ex-

pert Systems with Applications, 41(5):2250–2258.

Salton, G. and Buckley, C. (1988). Term-weighting ap-

proaches in automatic text retrieval. Information pro-

cessing & management, 24(5):513–523.

Schnitzler, H.-U. and Kalko, E. K. (2001). Echolocation by

insect-eating bats: We deﬁne four distinct functional

groups of bats and ﬁnd differences in signal structure

that correlate with the typical echolocation tasks faced

by each group. Bioscience, 51(7):557–569.

Sharma, A. and Dey, S. (2012). A comparative study of

feature selection and machine learning techniques for

sentiment analysis. In Proceedings of the 2012 ACM

research in applied computation symposium, pages 1–

7. ACM.

Song, F., Guo, Z., and Mei, D. (2010). Feature selection

using principal component analysis. In 2010 inter-

national conference on system science, engineering

design and manufacturing informatization, volume 1,

pages 27–30. IEEE.

Talbi, E.-G. (2009). Metaheuristics: from design to imple-

mentation, volume 74. John Wiley & Sons.

Tan, S., Cheng, X., Wang, Y., and Xu, H. (2009). Adapting

naive bayes to domain adaptation for sentiment analy-

sis. In European Conference on Information Retrieval,

pages 337–349. Springer.

Vinodhini, G. and Chandrasekaran, R. (2012). Sentiment

analysis and opinion mining: a survey. International

Journal, 2(6):282–292.

Yang, X.-S. (2012). Bat algorithm for multi-objective opti-

misation. arXiv preprint arXiv:1203.6571.

Zhu, J., Wang, H., and Mao, J. (2010). Sentiment classiﬁca-

tion using genetic algorithm and conditional random

ﬁelds. In 2010 2nd IEEE International Conference

on Information Management and Engineering, pages

193–196. IEEE.

ECTA 2019 - 11th International Conference on Evolutionary Computation Theory and Applications

138