SIGRNN: Synthetic Minority Instances Generation in Imbalanced

Datasets using a Recurrent Neural Network

Reda Al-Bahrani

, Dipendra Jha

, Qiao Kang, Sunwoo Lee

, Zijiang Yang,

Wei-Keng Liao, Ankit Agrawal

and Alok Choudhary

Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, U.S.A.

Keywords:

Synthetic Data, Balancing, Oversampling, Classiﬁcation, Imbalanced Dataset.

Abstract:

Machine learning models trained on imbalanced datasets tend to produce sub-optimal results. This happens be-

cause the learning of the minority classes is dominated by the learning of the majority class. Recommendations

to overcome this obstacle include oversampling the minority class by synthesizing new instances and using

different performance measures. We propose a novel approach to handle the imbalance in datasets by using a

sequence-to-sequence recurrent neural network to synthesize minority class instances. The generative neural

network is trained on the minority class instances to learn its data distribution; the generative neural network

is then used to synthesize minority class instances; these instances are used to augment the original dataset

and balance the minority class. We evaluate our proposed approach against several imbalanced datasets. We

train Decision Tree models on the original and augmented datasets and compare their results against the Syn-

thetic Minority Over-sampling TEchnique (SMOTE), Adaptive Synthetic sampling (ADASYN) and Synthetic

Minority Over-sampling TEchnique-Nominal Continuous (SMOTE-NC). All results are an average of mul-

tiple runs and the results are compared across four different performance metrics. SIGRNN performs well

compared to SMOTE and ADASYN, speciﬁcally in lower percentage increments to the minority class. Also,

SIGRNN outperforms SMOTE-NC on datasets having nominal features.

1 INTRODUCTION

Classiﬁcation datasets for training machine learning

models are generally assumed to be balanced. A

balanced dataset is composed of approximately an

equal number of instances from each class. However,

some scientiﬁc and real-world datasets are highly im-

balanced. The ratio between some classes in these

datasets can be quite high. Machine learning models

trained on such imbalanced datasets tend to produce

sub-optimal results with inappropriate prediction ac-

curacy (Visa and Ralescu, 2005; Maloof, 2003).

Since the models focus on learning the data represen-

tation of the majority class, they tend to neglect the

data representation of the minority classes (Japkow-

icz et al., 2000; Japkowicz and Stephen, 2002). There

exist several research works that have investigated the

problem of imbalanced datasets with machine learn-

https://orcid.org/0000-0002-1528-0792

https://orcid.org/0000-0002-6210-1937

https://orcid.org/0000-0001-6334-3068

https://orcid.org/0000-0002-5519-0302

ing algorithms such as neural networks and support

vector machines (Fawcett and Provost, 1997; Chan

and Stolfo, 1998; Kubat et al., 1997b).

Some existing approaches to overcome the chal-

lenge of imbalance in training datasets include

re-sampling using unsupervised learning, under-

sampling the majority class, oversampling the minor-

ity class, synthesizing from the minority class, and us-

ing different performance measures (Yap et al., 2014;

Nickerson et al., 2001; Drummond et al., 2003; Es-

tabrooks et al., 2004). These approaches are based on

either decreasing the number of instances in the ma-

jority class or increasing the number of instances the

minority class. Usually, the minority class instances

are incremented by either repeating the original in-

stances or constructing new instances using nearest

neighbor approach based on random subsets of in-

stances.

Synthetic Minority Oversampling TEchnique

(SMOTE) increases the minority class by creating

synthetic instances based on the k-nearest neighbor

instances in the minority class (Chan and Stolfo,

1998). SMOTE, as demonstrated on multiple datasets

Al-Bahrani, R., Jha, D., Kang, Q., Lee, S., Yang, Z., Liao, W., Agrawal, A. and Choudhar y, A.

SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network.

DOI: 10.5220/0010348103490356

In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2021), pages 349-356

ISBN: 978-989-758-486-2

349

by the authors, can handle continuous variables

only. Synthetic Minority Over-sampling TEchnique-

Nominal Continuous (SMOTE-NC) is a variant pre-

sented in the paper able to handle nominal data.

SMOTE-NC was tested by the authors on a single

dataset, the Adult dataset. SMOTE and SMOTE-NC

cannot generate synthetic instances in a dataset con-

taining only nominal features. To our knowledge,

SMOTE and many SMOTE variants (Han et al., 2005;

Ramentol et al., 2012; Maciejewski and Stefanowski,

2011) can only operate on datasets containing numeri-

cal features. Adaptive synthetic sampling (ADASYN)

(He et al., 2008) is another approach where the algo-

rithm tries to learn examples that are harder to gener-

ate in the minority class.

In this paper, we present a novel approach

of Synthetic minority Instances Generation using

Recurrent Neural Network (SIGRNN) to handle the

imbalance in datasets. The proposed approach uti-

lizes an encoder-decoder recurrent neural network to

generate synthetic instances from the minority class

population. Instances in the minority class of the

dataset are treated as a ﬁxed length set of features

where each feature is represented by a set of tokens.

By treating each instance in the minority class as a

small set vocabulary (a sentence), the sequence-to-

sequence encoder-decoder recurrent neural network is

trained to predict the next token based on the current

and past input tokens of a sentence. The SIGRNN

model is trained only on the minority class to augment

the training datasets by generating synthetic minority

instances We evaluate our approach using three dif-

ferent datasets. These datasets represent different fea-

ture types, minority class to majority class ratios, and

minority class sizes.

To analyze the impact of the proposed data aug-

mentation approach, Decision Tree models were

trained on the original dataset and the augmented

datasets, and the results were compared against

SMOTE, ADASYN, and SMOTE-NC depending on

the input dataset. Metrics such as Accuracy, Area un-

der the ROC Curve, F1 score, and Gmean were com-

pared.

2 RELATED WORK

2.1 Over-sampling

Synthetic Minority Over-sampling TEchnique

(SMOTE) presented in (Chawla et al., 2002) over-

samples the minority class by creating “synthetic”

examples. SMOTE operates on the features by

taking each minority class instance and introducing

synthetic instances along the line segments joining

any/all of the k minority class nearest neighbors. A

number of synthetic instances are generated based

on the k nearest neighbors of each instance in the

minority class. In case there exist nominal features

in the data, Synthetic Minority Over-sampling

TEchnique-Nominal Continuous (SMOTE-NC)

populates nominal features by selecting the value

occurring in the majority of the k-nearest neighbors.

Adaptive Synthetic Sampling approach (ADASYN)

is another approach where the algorithm focuses

on learning examples that are hard to generate in

the minority class while focusing less on generating

examples that are easy to learn.

2.2 Performance Measures

Several performance measures have been proposed to

measure the effectiveness of machine learning classi-

ﬁers on the minority class. The area under the receiver

operating characteristic curve is by far the most used.

The AUC ROC represents the relationship between

sensitivity and speciﬁcity (Beck and Shultz, 1986).

The F1-score captures the relationship between pre-

cision and recall. Other measures have been proposed

such as the geometric mean (Kubat et al., 1997b).

2.3 Language Modeling

In statistical language modeling recurrent neural net-

works have been used to learn a representation of

words by training on a large corpus (Bengio et al.,

2003). Such models are capable of learning the prob-

ability of word sequences. In (Cho et al., 2014) the

authors propose a recurrent neural network encoder-

decoder architecture capable of learning mappings of

input sequences to an output sequence. The concept

of sequence-to-sequence recurrent neural networks

since has been used for tasks such as language trans-

lation (Sutskever et al., 2014) and generating image

descriptions (Karpathy and Fei-Fei, 2015).

3 SYNTHETIC MINORITY

INSTANCES GENERATION

USING A RECURRENT

NEURAL NETWORK

The overall process of the SIGRNN approach is com-

posed of four main steps - feature ordering, building

the corpus and input data for training the recurrent

neural network, model selection and training, and data

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

350

instance generation using the trained model. These

steps are explained below.

3.1 Entropy based Feature Ordering

Text inputs in general RNN training follow the syntax

and semantics of the language. The same word can

be used at different positions in the input with/without

altering the meaning of the sentence; the RNN model

would automatically capture the semantics of the lan-

guage from the input sentences. In our case, the in-

put is not from a natural language; rather the input

sentences are records, the feature order is based on

the original dataset. The order of features from the

original dataset does not have any particular mean-

ing, they can be ordered in any way and used as input

to train our model. However, the order of features in

the input can be critical to the learning of the RNN

model. Some features can be easy to learn, only hav-

ing some speciﬁc range/list of values for our minority

class, while the model can get confused with other

features having a broader range of values. To handle

this, we used an entropy-based feature ordering for

our input training data.

We apply an entropy ordering method to gener-

ate new records. A parameter setting for the LSTM

model can encode the joint probability distribution

of all features. Having an ordered set of features

, ..X

}, an LSTM model can encode the distribu-

tions of P

, .., X

i−1

)∀i ∈ [1, n] for a particular

class c. The joint distribution of all features can be

derived as the following.

= x

, .., X

= x

) =

∏

i=1

= x

, .., X

i−1

= x

i−1

)

(1)

Based on P

, .., X

), denoted as P, we can gen-

erate random sequences of features for future train-

ing. It is obvious that the computation of feature joint

distribution depends on the ordering of features. If

the LSTM models can perfectly model the conditional

distributions, the ordering does not matter. However,

this assumption is usually unrealistic, especially when

the instance is imbalanced. Let ε be an upper bound

error associated with the conditional probability, as-

suming ε + p

< 1 and p

−ε > 0 ∀1 ≤i ≤n. We want

to minimize the error term in the ﬁnal joint probability

distribution, denoted as ∆(X

, ..., X

). We use notation

= P

, ..., X

i−1

) for simplicity.

∆(X

, ..., X

) = |

∏

i=1

±ε) −P| (2)

However, p

and p

are not independent of each

other, since they both depend on the choice of fea-

ture ordering. As a result, we cannot simply minimize

each of the p

terms. An exhaustive search for the fea-

ture ordering has a factorial complexity with respect

to the number of features, which is not feasible in re-

ality.

To solve this problem, we apply a greedy heuristic

search approach based on entropy. The entropy of p

denoted as H(p

), measures the degree of randomness

of a probability distribution. It has the following prop-

erty: If the probabilities of k random variables are all

close to

, the entropy approaches to the maximum

entropy log(k). Moreover, ∆(x

, ..., x

) is minimized

if all terms p

are as close to uniform distribution as

possible. Thus, maximizing

∑

i=1

H(p

) is equivalent

to minimizing ∆(X

, ..., X

). Hence we order the fea-

tures based on the conditional entropy p

using a De-

cision Tree method. This approach is equivalent to a

greedy search of n layers using entropy of p

∀i ∈[1, n]

as heuristics.

We trained a Decision Tree based on entropy as

the decision criterion on our original training data. In

a Decision Tree, the feature entropy decreases from

root to leaves; the features at the top have high en-

tropy while the leaves have an entropy of zero. We

use the entropy computed by a Decision Tree to sort

our input features from highest to lowest entropy. The

model is provided with features using this order dur-

ing training.

3.2 Building the Corpus and Input Data

We convert the minority class instances into a corpus

to train the SIGRNN. An instance R consists of X

features: categorical and numerical features. Each

value V in a feature is considered to be a word in

the vocabulary. To uniquely identify a feature based

on its value, the words are assigned tokens. For in-

stance, one of our features is GENDER, having two

categories- 1 for MALE and 2 for FEMALE, two to-

kens will be generated for the GENDER, these tokens

are: GENDER MALE and GENDER FEMALE. Af-

ter these tokens are generated we use them to convert

instances in the feature space to sentences in the lan-

guage space.

The ﬁnal corpus is composed of sentences con-

taining X words (tokens) where each word repre-

sents a feature value and the sentence represents

an instance. The corpus generated consists of sen-

tences corresponding to the minority class instances

in the training dataset. Next, we build a vocabulary

to uniquely map each word in our training dataset

into embeddings and convert our input sentences into

word vectors. These word vectors are fed into the

SIGRNN for training.

SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network

351

3.3 Model Training and Instance

Generation

The SIGRNN is composed of an encoder-decoder

model using Long-short Term Memory (LSTM) cells

to predict the next feature based on the observed fea-

tures. The encoder is basically an embedding look-

up table for the input word vectors; it converts the

inputs from high dimension due to large size of vo-

cabulary to a reduced representation. A sequence to

sequence decoder is used on top of the encoder out-

put, and is composed of multiple LSTM layers. We

experimented with different number of LSTM cells in

different layers and different hyper-parameters to ﬁne

tune our model; the experiments presented here used

a decoder layer, two layers of 512 or 1,024 LSTM

cells, and a decoder, with a mini batch size of 8 or

32 depending of the dataset size. A fully connected

layer with softmax activation is used on the decoder

output to get the probability of output words. The

model architecture is shown in Figure 1. The embed-

ding size used was the same size of the number of

tokens in the minority class of the training set. The

models were trained for 15 epochs. Once the models

were trained, sequences of tokens are then generated

using the network. Finally, the generated tokens are

converted back from the language space to the feature

space and the training datasets were augmented from

the generated instances.

3.4 Evaluation Approach

We trained Decision Tree models to evaluate and

compare the impact of data augmentation using the

SIGRNN against SMOTE, ADASYN, and SMOTE-

NC. The models are trained and selected through a

10-fold cross-validation; each experiment is run with

10 different random seeds. The results are an aver-

age over the 10 runs. This comparison is carried out

to evaluate the data generation and not the machine

learning algorithm itself. We use Decision Trees

as the baseline model and use it throughout evalua-

tion of datesets generated by SMOTE, ADASYN, and

SMOTE-NC, and our proposed recurrent neural net-

work approach.

The network is coded using PyTorch (Paszke

et al., 2017) and is trained using an Nvidia GPU (GTX

TitanX). To build the Decesion Tree models h2o plat-

form (Candel et al., 2016) has been used, and to bal-

ance using SMOTE, ADASYN, and SMOTE-NC we

used the implementation of (Lema

ıtre et al., 2017).

4 RESULTS

In this section, we evaluate the efﬁciency of our pro-

posed approach by analyzing two factors. First, we

compare the quality of the generated data compared to

the original minority class data. Second, we evaluate

SIGRNN against SMOTE, and ADASYN by compar-

ing Decision Tree performance across different per-

formance metrics. Also, we compare SIGRNN and

SMOTE-NC on datasets that contain nominal fea-

tures.

4.1 Performance Metrics

Decision Trees were used as the baseline model in all

experiments. The built models were evaluated on the

following performance metrics:

1. Accuracy: It is the fraction of correctly classiﬁed

examples in the test set.

Accuracy =

correct predictions

totalnumbero f predictions

(3)

2. Area under the Receiver Operating Character-

istic Curve: The ROC curve is created by plotting

the true positive rate (TPR) against the false posi-

tive rate (FPR) at various threshold settings.

3. F1-score: It is a measure of a test’s accuracy. It

considers both precision and recall of the test to

compute the score. The F1 score is the harmonic

average of the precision and recall.

= 2 ×

precision ×recall

precision + recall

(4)

4. Geometric Mean of Class Accuracy: It com-

bines the positive class accuracy (PA) and the neg-

ative class accuracy (NA).(Kubat et al., 1997a)

Gmean =

√

PA ×NA (5)

4.2 Datasets

We present results for three different datasets, these

datasets are publicly available (Dua and Graff, 2017).

Table 1 describes the datasets used in our experiments

as below.

1. SATIMAGE: This database consists of t values

of pixels in 3 by 3 neighbourhoods in a satellite

image, and the classiﬁcation associated with the

central pixel in each neighbourhood. The goal is

to classify the pixel, given the multi-spectral val-

ues. To generate an imbalanced dataset all classes

were collapsed except for class 4.

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

352

Encoder

(Embedding)

RNN

(LSTM)

RNN

(LSTM)

0.5

dropout

Decoder

(Linear)

Input

0.5

dropout

0.5

dropout

Figure 1: An overview of the recurrent neural network architecture used to generate synthetic minority class instances.

2. HABERMAN: This dataset contains breast can-

cer cases from the Billings Hospital at the Uni-

versity of Chicago. The two classes present in

the dataset indicate survival of 5 years of patients

who had undergone surgery for breast cancer be-

tween 1958 and 1970. The data consists of four

attributes: age of patient at time of operation, year

of operation, number of positive axillary nodes

detected, and survival status.

3. ADULT: The Adult dataset consists of Census In-

come information and is used to predict if an in-

dividual’s income is greater than $50K/yr. The

information was extracted from the 1994 census

bureau database. It contains information of work-

ing adults between the ages of 16 and 100. The

dataset contains 48,842 records, each record con-

taining ﬁve numerical and eight categorical fea-

tures.

Table 1: Description of the datasets used in our experi-

ments.

Train Test

Name Types Maj/Min Maj/Min

Satimage Num 4k/415 1.7k/211

Haberman Num 179/65 46/16

Adult Num/Cat 24.7k/7.8k 12.4k/3.8k

4.3 Quality of Generated Data

We compare the quality of the generated instances by

comparing their distribution with the original minor-

ity class instances. We took two features from the

ADULT dataset and generated histogram diagrams.

The blue histograms are for SMOTE-NC, red for

SIGRNN, and black is the original data. In Figure 2

the original minority class and the generated minority

class instances are overlayed to compare the distribu-

tion of the two features. In both features SIGRNN

generates data that spans the whole set of bins, while

SMOTE-NC struggles to generate values in the under-

represented bins i.e. the bins on the two tails of the

distribution.

4.4 Performance Improvement

We compared the performance improvements by

training Decision Tree models on the augmented

dataset. For this, we train the Decision Tree models

on training datasets created using different amount of

synthetic instances (from 100% to 900% depending

on the dataset size). We compare the performance of

our models trained on data augmented using recurrent

neural network to the same models trained on the data

augmented using SMOTE, ADASYN, and SMOTE-

NC.

We performed a 10-fold cross validation with

hyper-parameter tuning. The best model from 10-

fold cross validation was selected. The ﬁnal model

is then used to generate performance metrics on the

test set. This process was repeated at every increment

in the minority class. First, Table 2 shows the com-

parison of SIGRNN and SMOTE-NC on the Adult

dataset. SIGRNN performs well on datasets contain-

ing nominal data. We suspected that there will be

a performance gap between SIGRNN and SMOTE-

NC, since SMOTE-NC selects the value occurring

the most in the k-nearest neighbors for nominal val-

ues while SIGRNN produces a value based on the

sequence of prior features while generating the in-

stance. Second, in Table 3 and 4 we compare the per-

formace of the two datasets consisting of only con-

tinuous features over multiple performance metrics.

Looking at the Gmean metric our proposed method

outperforms SMOTE and ADASYN in most cases.

Area under the ROC also shows improvement over

other algorithms in most increments. In the case of

Accuracy and F1 score, SIGRNN either matches or

slightly trails SMOTE and ADASYN. SIGRNN per-

formance can be compared to SMOTE even in cases

where the training minority class is small.

5 CONCLUSION AND FUTURE

WORK

We formalize a method to handle imbalance in

datasets utilizing a language model approach by con-

verting a dataset to a corpus and then applying a

sequence-to-sequence generative neural network to

generate new sentences in the corpus. The gener-

ated corpus is then converted back to the original fea-

ture space. The transformation from feature space

to corpus and back again to feature space produces

promising results to tackle imbalanced datasets. We

evaluated this method using multiple datasets of dif-

SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network

353

(a) ADULT histogram of hours per week

(b) ADULT histogram of hours per week

(d) ADULT histogram of age

Figure 2: Histograms showing distribution of the actual mi-

nority class vs. the synthetic generated instances for two

features from the ADULT dataset. The blue histograms are

for SMOTE-NC, red for SIGRNN, and black is the origi-

nal data. This overly is to compare the distribution of the

two features. In both features SIGRNN generates data that

spans the whole set of bins, while SMOTE-NC struggles to

generate values in the under-represented bins.

ferent sizes, and features types. We demonstrated

that the approach works well compared to SMOTE,

ADASYN, and SMOTE-NC. Although, we handle

numerical attributes in our proposed implementa-

Table 2: The adult dataset consists of a mixture of continu-

ous and nominal features. Adult-1 is the original form of the

dataset while Adult-2, and Adult-3 are created by remov-

ing continuous and nominal features respectively. Features

were removed to demonstrate the behaviour of SMOTE-NC

(S-NC) compared to SIGRNN in handling nominal features.

Where * is present the variance is ±0.01.

Adult-1

% Method Acc AUC F1 Gm

0 - 0.83 0.81* 0.89 0.75*

100 SIGRNN 0.81 0.82 0.87 0.78*

S-NC 0.81 0.82 0.87 0.78

200 SIGRNN 0.8 0.83 0.86 0.79

S-NC 0.81 0.82 0.87 0.79

300 SIGRNN 0.8 0.83 0.86 0.79

S-NC 0.8 0.82 0.86 0.79

Adult-2

% Method Acc AUC F1 Gm

0 - 0.8 0.77 0.87 0.68*

100 SIGRNN 0.79 0.79* 0.86 0.71*

S-NC 0.78 0.78 0.85 0.72

200 SIGRNN 0.77 0.79 0.84 0.72

S-NC 0.75* 0.78 0.82* 0.73

300 SIGRNN 0.77 0.8 0.84 0.72

S-NC 0.75* 0.78 0.82* 0.74

Adult-3

% Method Acc AUC F1 Gm

0 - 0.82 0.8 0.88 0.74

100 SIGRNN 0.81 0.82 0.87 0.76

S-NC 0.8 0.8 0.86 0.77

200 SIGRNN 0.79 0.82 0.86 0.78

S-NC 0.79 0.81 0.85 0.78

300 SIGRNN 0.79 0.82* 0.85 0.78

S-NC 0.78 0.81 0.85 0.78

tion, this approach can be improved by adopting a

branched recurrent neural network where each data

type is handled by a branch to avoid converting nu-

merical values to tokens before training the SIGRNN.

We plan to experiment with adversarial training in fu-

ture and train on both classes to generate better mi-

nority class instances.

ACKNOWLEDGMENT

This work is supported in part by the U.S. Department

of Energy award numbers DE-SC0014330 and DE-

SC0019358 and National Institute of Standards and

Technology award number 70NANB19H005.

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

354

Table 3: Results for the decision tree on the Haberman dataset. Each result presents is the average of 10 different runs and the

associated standard deviation. We can ovserve that SIGRNN performs well on the AUC ROC, and Gmean metrics in earlier

increments. SIGRNN performs well as good as the other balancing methods on the F1 metric.

Haberman

% Method Acc AUC F1 Gmean

0 - 0.69 ±0.05 0.61 ±0.05 0.79 ±0.04 0.57 ±0.06

100 SIGRNN 0.65 ±0.03 0.59 ±0.04 0.76 ±0.03 0.56 ±0.06

SMOTE 0.65 ±0.04 0.58 ±0.04 0.76 ±0.03 0.55 ±0.05

ADASYN 0.65 ±0.06 0.58 ±0.07 0.75 ±0.05 0.55 ±0.08

Table 4: Results for the decision tree on the Satimage dataset. Each result presents is the average of 10 different runs and the

associated standard deviation. We can ovserve that SIGRNN performs well on the AUC ROC, and Gmean metrics in earlier

increments. SIGRNN performs well as good as the other balancing methods on the F1 metric.

Satimage

% Method Acc AUC F1 Gmean

0 - 0.9±0.01 0.74±0.01 0.95±0.0 0.71±0.02

100 SIGRNN 0.9±0.01 0.77±0.02 0.94±0.0 0.75±0.03

SMOTE 0.9±0.01 0.77±0.01 0.94±0.0 0.75±0.02

ADASYN 0.9±0.01 0.76±0.02 0.94±0.0 0.74±0.03

200 SIGRNN 0.89±0.01 0.78±0.02 0.94±0.0 0.76±0.02

SMOTE 0.9±0.01 0.77±0.02 0.94±0.0 0.76±0.02

ADASYN 0.9±0.01 0.77±0.01 0.95±0.0 0.75±0.01

300 SIGRNN 0.89±0.01 0.79±0.01 0.94±0.0 0.79±0.02

SMOTE 0.89±0.01 0.78±0.02 0.94±0.0 0.77±0.02

ADASYN 0.89±0.01 0.77±0.02 0.94±0.0 0.76±0.03

400 SIGRNN 0.88±0.01 0.79±0.01 0.93±0.0 0.79±0.01

SMOTE 0.9±0.01 0.79±0.01 0.94±0.0 0.78±0.02

ADASYN 0.89±0.01 0.78±0.01 0.94±0.0 0.77±0.02

500 SIGRNN 0.88±0.01 0.8±0.02 0.93±0.0 0.8±0.02

SMOTE 0.89±0.01 0.78±0.01 0.94±0.0 0.79±0.01

ADASYN 0.88±0.0 0.8±0.01 0.93±0.0 0.79±0.01

600 SIGRNN 0.88±0.01 0.8±0.02 0.93±0.01 0.8±0.01

SMOTE 0.89±0.01 0.79±0.01 0.94±0.0 0.8±0.02

ADASYN 0.87±0.01 0.8±0.01 0.93±0.01 0.79±0.01

700 SIGRNN 0.88±0.01 0.8±0.01 0.93±0.0 0.81±0.01

SMOTE 0.88±0.01 0.8±0.01 0.93±0.0 0.8±0.01

ADASYN 0.88±0.01 0.81±0.02 0.93±0.0 0.8±0.02

800 SIGRNN 0.88±0.0 0.81±0.01 0.93±0.0 0.81±0.01

SMOTE 0.88±0.01 0.8±0.01 0.93±0.0 0.8±0.01

ADASYN 0.88±0.01 0.82±0.01 0.93±0.0 0.83±0.01

900 SIGRNN 0.87±0.01 0.8±0.02 0.93±0.0 0.81±0.02

SMOTE 0.88±0.01 0.81±0.01 0.93±0.0 0.81±0.01

ADASYN 0.88±0.01 0.83±0.01 0.93±0.0 0.84±0.01

REFERENCES

Beck, J. R. and Shultz, E. K. (1986). The use of rela-

tive operating characteristic (roc) curves in test perfor-

mance evaluation. Archives of pathology & laboratory

medicine, 110(1):13–20.

Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C.

(2003). A neural probabilistic language model. Jour-

nal of machine learning research, 3(Feb):1137–1155.

Candel, A., Parmar, V., LeDell, E., and Arora, A. (2016).

H2o.ai. H2O. ai Inc.

Chan, P. K. and Stolfo, S. J. (1998). Toward scalable learn-

ing with non-uniform class and cost distributions: A

case study in credit card fraud detection. In KDD, vol-

ume 98, pages 164–168.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer,

W. P. (2002). Smote: synthetic minority over-

sampling technique. Journal of artiﬁcial intelligence

SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network

355

research, 16:321–357.

Cho, K., Van Merri

enboer, B., Gulcehre, C., Bahdanau, D.,

Bougares, F., Schwenk, H., and Bengio, Y. (2014).

Learning phrase representations using rnn encoder-

decoder for statistical machine translation. arXiv

preprint arXiv:1406.1078.

Drummond, C., Holte, R. C., et al. (2003). C4. 5, class

imbalance, and cost sensitivity: why under-sampling

beats over-sampling. In Workshop on learning from

imbalanced datasets II, volume 11, pages 1–8. Cite-

seer.

Dua, D. and Graff, C. (2017). UCI machine learning repos-

itory.

Estabrooks, A., Jo, T., and Japkowicz, N. (2004). A multi-

ple resampling method for learning from imbalanced

data sets. Computational intelligence, 20(1):18–36.

Fawcett, T. and Provost, F. (1997). Adaptive fraud de-

tection. Data mining and knowledge discovery,

1(3):291–316.

Han, H., Wang, W.-Y., and Mao, B.-H. (2005). Borderline-

smote: a new over-sampling method in imbalanced

data sets learning. In International conference on in-

telligent computing, pages 878–887. Springer.

He, H., Bai, Y., Garcia, E. A., and Li, S. (2008). Adasyn:

Adaptive synthetic sampling approach for imbal-

anced learning. In Neural Networks, 2008. IJCNN

2008.(IEEE World Congress on Computational In-

telligence). IEEE International Joint Conference on,

pages 1322–1328. IEEE.

Japkowicz, N. et al. (2000). Learning from imbalanced

data sets: a comparison of various strategies. In AAAI

workshop on learning from imbalanced data sets, vol-

ume 68, pages 10–15. Menlo Park, CA.

Japkowicz, N. and Stephen, S. (2002). The class imbalance

problem: A systematic study. Intelligent data analy-

sis, 6(5):429–449.

Karpathy, A. and Fei-Fei, L. (2015). Deep visual-semantic

alignments for generating image descriptions. In Pro-

ceedings of the IEEE conference on computer vision

and pattern recognition, pages 3128–3137.

Kubat, M., Holte, R., and Matwin, S. (1997a). Learning

when negative examples abound. In European Confer-

ence on Machine Learning, pages 146–153. Springer.

Kubat, M., Matwin, S., et al. (1997b). Addressing the curse

of imbalanced training sets: one-sided selection. In

ICML, volume 97, pages 179–186. Nashville, USA.

Lema

ıtre, G., Nogueira, F., and Aridas, C. K. (2017).

Imbalanced-learn: A python toolbox to tackle the

curse of imbalanced datasets in machine learning.

Journal of Machine Learning Research, 18(17):1–5.

Maciejewski, T. and Stefanowski, J. (2011). Local neigh-

bourhood extension of smote for mining imbalanced

data. In 2011 IEEE Symposium on Computational In-

telligence and Data Mining (CIDM), pages 104–111.

IEEE.

Maloof, M. A. (2003). Learning when data sets are imbal-

anced and when costs are unequal and unknown. In

ICML-2003 workshop on learning from imbalanced

data sets II, volume 2, pages 2–1.

Nickerson, A., Japkowicz, N., and Milios, E. E. (2001). Us-

ing unsupervised learning to guide resampling in im-

balanced data sets. In AISTATS.

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E.,

DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and

Lerer, A. (2017). Automatic differentiation in pytorch.

xyz.

Ramentol, E., Caballero, Y., Bello, R., and Herrera, F.

(2012). Smote-rsb*: a hybrid preprocessing approach

based on oversampling and undersampling for high

imbalanced data-sets using smote and rough sets the-

ory. Knowledge and information systems, 33(2):245–

265.

Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Se-

quence to sequence learning with neural networks. In

Advances in neural information processing systems,

pages 3104–3112.

Visa, S. and Ralescu, A. (2005). Issues in mining imbal-

anced data sets-a review paper. In Proceedings of

the sixteen midwest artiﬁcial intelligence and cogni-

tive science conference, volume 2005, pages 67–73.

sn.

Yap, B. W., Rani, K. A., Rahman, H. A. A., Fong, S.,

Khairudin, Z., and Abdullah, N. N. (2014). An appli-

cation of oversampling, undersampling, bagging and

boosting in handling imbalanced datasets. In Pro-

ceedings of the ﬁrst international conference on ad-

vanced data and information engineering (DaEng-

2013), pages 13–22. Springer.

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

356