IMPROVING ELECTRIC FRAUD DETECTION

USING CLASS IMBALANCE STRATEGIES

Mat´ıas Di Martino, Federico Decia, Juan Molinelli and Alicia Fern´andez

Instituto de Ingenier´ıa El´ectrica, Facultad de Ingenier´ıa Universidad de la Rep´ublica Montevideo, Montevideo, Uruguay

Keywords:

Electricity theft, Support vector machine, Optimum path forest, Unbalance class problem, Combining

classiﬁer, UTE.

Abstract:

Improving nontechnical loss detection is a huge challenge for electric companies. The great number of clients

and the diversity of the different types of fraud makes this a very complex task. In this paper we present a

fraud detection strategy based on class imbalance research. An automatic detection tool combining classiﬁ-

cation strategies is proposed. Individual classiﬁers such as One Class SVM, Cost Sensitive SVM (CS-SVM),

Optimum Path Forest (OPF) and C4.5 Tree, and combination functions are designed taken special care in the

data’s class imbalance nature. Analysis over consumers historical kWh load proﬁle data from Uruguayan Elec-

tric Company (UTE) shows that using combination and balancing techniques improves automatic detection

performance.

1 INTRODUCTION

Nontechnical losses represent a very high cost to

power supply companies, who aims to improve fraud

detection in order to reduce this losses. Research in

pattern classiﬁcation ﬁeld has been made to tackle this

problem (Ramos et al., 2010), (Nagi and Mohamad,

2010), (Muniz et al., 2009), (Jiang et al., 2000)

In Uruguay the national electric power company

(henceforth call UTE) faces the problem by manually

monitoring a group of customers. A group of experts

looks at the monthly consumption curve of each cus-

tomer and indicates those with some kind of suspi-

cious behavior. This set of customers, initially clas-

siﬁed as suspects are then analyzed taking into ac-

count other factors (such as fraud history, counter type

etc.). Finally a subset of customers is selected to be

inspected by an UTE employee, who conﬁrms (or not)

the irregularity. The procedure described before, has

major drawbacks, mainly, the number of costumers

that can be manually controlled is small compared

with the total amount of costumer (around 500.000

only in Montevideo). To improve the efﬁciency of

fraud detection and resource utilization, we imple-

mented a tool that automatically detects suspicious

behavior analyzing customers historical consumption

curve. Thus, UTE’s experts only need to look to a re-

duced number of costumers and then select those who

need to be inspected.

Due to the applications nature there is a great

imbalance between “normal” and “fraud/suspicious”

classes. The class imbalance problem in general and

fraud detection in particular have received consider-

able attention in recent years. Garcia et al. and Guo

and Zhou review main topics in the ﬁeld of the class

imbalance problem (Garcia et al., 2007), (Guo and

Zhou, 2008). These include: resampling methods for

balancing data sets (Batista et al., 2004),(Barandela

and Garcia, 2003), (Chawla et al., 2002), (Chawla

et al., 2003), (Kolez et al., 2003), feature extrac-

tion and selection techniques -wrapper (Dash and Liu,

1997), and choose of F-value as performance mea-

sure.

In addition, it is generally accepted that combina-

tion of diverse classiﬁers can improveperformance. A

difﬁcult task is to choose the combination strategy for

a diverse set of classiﬁers. Kuncheva found the opti-

mum set of weights for the majority weight vote com-

biner when the performance metrics is accuracy and

with independent base classiﬁers (Kuncheva, 2004).

Further analysis has been done on the relationship be-

tween diversity and the majority rules performance

(Brown and Kuncheva, 2010), (Wang and Yao, 2009),

(Chawla and Sylvester, 2007). In this paper we pro-

pose a combination function adapted to the imbalance

between classes, using F-value as the performance

measurement and some well-known pattern recogni-

tion techniques such as SVM (Support Vector Ma-

135

Di Martino M., Decia F., Molinelli J. and Fernández A. (2012).

IMPROVING ELECTRIC FRAUD DETECTION USING CLASS IMBALANCE STRATEGIES.

In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, pages 135-141

DOI: 10.5220/0003768401350141

 SciTePress

chine) (Vapnik, 1998), (Scholkopf and Smola, 2002),

Tree classiﬁers and more recent algorithms such as

Optimum Path Forest (Papa and Falcao, 2010),(Papa

et al., 2007) as base classiﬁers.

Performance evaluation using test dataset shows

very good results on suspicious proﬁles selection.

Also, on ﬁeld evaluation of fraud detection using our

automatic system shows similar results to manual ex-

perts’ method.

The paper is organized as follows. Section 2 de-

scribes general aspects of the class imbalance prob-

lem, section 3 describes different strategies proposed,

section 4 presents the results obtained, and, ﬁnally,

section 5 concludes the work.

2 THE CLASS IMBALANCE

PROBLEM

When working on the fraud detection problem, one

can not assume that the number of people who com-

mit fraud are the same than those who do not, usually

there are fewers elements from the class who com-

mit fraud. This situation is known as the problem of

class imbalance, and it is particularly important in real

world applicationswhere it is costly to misclassify ex-

amples from the minority class. In this cases, stan-

dard classiﬁers tend to be overwhelmed by the major-

ity class and ignore the minority class, hence obtain-

ing suboptimal classiﬁcation performance. Having to

confront this type of problem, we decided to use three

different strategies on different levels, changing class

distribution by resampling, manipulating classiﬁers,

and on the ensemble of them.

The ﬁrst consists mainly in resampling techniques

such as under-sampling the majority class or over-

sampling the minority one. Random under-sampling

aims at balancing the data set through random re-

moval of majority class examples. The major prob-

lem of this technique is that it can discard poten-

tially important data for the classiﬁcation process. On

the other hand, the simplest over-sampling method is

to increase the size of the minority class by random

replication of those samples. The main drawback of

over-sampling is the likelihood of over-ﬁtting, since

it makes exact copies of the minority class instances

As a way of facing the problems of resampling tech-

niques discussed before, different proposals address

the imbalance problem by adapting existing algo-

rithms to the special characteristics of the imbalanced

data sets. One approach is one-class classiﬁers, which

tries to describe one class of objects (target class) and

distinguish it from all other objects (outliers). In this

paper, the performance of One-Class SVM, adapta-

tion of the popular SVM algorithm, will be analyzed.

Another technique is cost-sensitive learning, where

the cost of a particular kind of error can be different

from others, for example by assigning a high cost to

mislabeling a sample from the minority class.

Another problem which arises when working with

imbalanced classes is that the most widely used met-

rics for measuring the performance of learning sys-

tems, such as accuracy and error rate, are not appro-

priate because they do not take into account misclas-

siﬁcation costs, since they are strongly biased to fa-

vor the majority class. In the past few years, sev-

eral new metrics which measure the classiﬁcation per-

formance on majority and minority classes indepen-

dently, hence taking into account the class imbalance,

have been proposed (Manning et al., 2009).

• Recall

TP+ FN

• Recall

TN + FP

• Precision =

TP+ FP

• F

value

(1+ β

)Recall

× Precision

Recall

+ Precision

Table 1: Confusion matrix.

Labeled as

Positive Negative

Positive TP (True Positive) FN (False Negative)

Negative FP (False Positive) TN (True Negative)

Recall

is the percentage of correctly classiﬁed

positive instances, in this case, the fraud samples.

Precision is deﬁned as the proportion of labeled as

positive instances that are actually positive. The com-

bination of this two measurements, the F-value, rep-

resents the geometric mean between them, weighted

by the parameter β. Depending on the value of β we

can prioritize Recall or Precision. For example, if we

have few resources to perform inspections, it can be

useful to prioritize Precision, so the set of samples la-

beled as positive has high density of true positive.

3 STRATEGY PROPOSED

The system presented consists of basically on three

modules: Pre-Processing and Normalization, Feature

selection and extraction and, ﬁnally, Classiﬁcation.

Figure 1 shows the system conﬁguration. The sys-

tem input corresponds to the last three years of the

monthly consumption curve of each costumer, here

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

136

Figure 1: Block Diagram.

called X

= {x

, ... x

}, where x

is the con-

sumption of the m costumer during the i-th month.

The ﬁrst module called Pre-Processing and Normal-

ization, normalizes the input data so that they all have

unitary mean and implements some ﬁlters to avoid

peaks from billing errors.

The proposed methodology was developed as GUI

software in Matlab using PRTOOLS (Duin, 2000), Li-

bOPF (Papa et al., 2008) and LibSVM (Chang and

Lin, 2001).

3.1 Attributes

A feature set was proposed taking into account UTEs

technician experts in fraud detection by manual in-

spection and recent papers on non technical loss de-

tection (Alcetegaray and Kosut, 2008), (Muniz et al.,

2009), (Nagi and Mohamad, 2010). Below a list of

some of the proposed features:

• Consumption ratio for the last 3, 6 and 12 months

and the average consumption.

• Norm of the difference between the expected con-

sumption and the actual consumption.

• Difference between Fourier coefﬁcients from the

last and previous years.

• Difference between Wavelet coefﬁcients from the

last and previous years.

• Difference in the coefﬁcients of the polynomial

that best ﬁts the consumption curve.

• Variance of the consumption curve.

• Slope of the straight line that ﬁts the consumption

curve.

It is well known that when thinking about the fea-

tures to use, large number of attributes do not imply

better performances. The important thing is their rel-

evance and the relationship between the number of

these and the number of elements. This is why we

implemented a feature selection stage. We imple-

mented several algorithms for feature selection, and

concluded that for each classiﬁer algorithms it is best

to use a different feature set.

3.2 Classiﬁers

SVM is an algorithm frequently used in pattern recog-

nition and fraud detection. The main purpose of the

binary SVM algorithm is to construct an optimal de-

cision function f(x) that predicts unseen data into two

classes and minimizes the classiﬁcation error. In or-

der to obtain this, one looks to maximize the sep-

aration margin between the two classes and hence

classify correctly unseen data (Nagi and Mohamad,

2010). This can be formulated as a quadratic pro-

gramming optimization problem

Φ(ω,ζ

) = min

(

kωk

∑

i=1

)

(1)

subjected to the constraint that all the training samples

are correctly classiﬁed, that is

(hω,xi + b) ≥ 1− ζ

, i = 1, 2,...,n (2)

where ζ

for i = 1,2,..., n are nonnegative slack vari-

ables. C is a regularization parameter and is selected

to be the tradeoff between the two terms in 1.

3.2.1 CS-SVM and One-class SVM

Two different approaches where introduced when de-

scribing the class imbalance problem, one-class clas-

siﬁers and cost-sensitive learning. When applying this

two approaches on SVM, we talk about One-Class

SVM and CS-SVM.

In One-Class SVM equation 1 becomes,

min

ω∈H ,ζ

∈R,ρ∈R

kωk

νl

∑

i=1

− ρ (3)

while in CS-SVM it becomes:

Φ(ω,ζ

) = min

(

kωk

∑

i/y

∑

i/y

=−1

−

)

(4)

Both the kernel parameter K and the values of C

−

and ω are often chosen using cross validation. The

method consists in splitting the data set into p parts of

equal size, and perform p training runs. Each time,

leaving out one of the p parts and use it as an inde-

pendent validation set for optimizing the parameters.

IMPROVING ELECTRIC FRAUD DETECTION USING CLASS IMBALANCE STRATEGIES

137

Usually, the parameters which work best on average

over the p runs are chosen. Finally, these average pa-

rameters are used to train the complete training set.

There are some problems with this, as can be seen on

(Scholkopf and Smola, 2002).

Having said this, the method used to determine the

optimum parameters for CS-SVM was:

1. Determine sets C = [C

,...,C

] and γ =

[γ

,γ

,..., γ

2. Select C

∈ C and γ

∈ γ, split the training set into

p parts of equal size and perform p training runs.

Each set is called B

with i = {1,2,..., p}.

3. Use B

= B

as the test set and B

= B

∪ B

∪

... ∪B

as the training set.

4. Determine a classiﬁer model for B

, C

and γ

As the ratio between the two classes is unbal-

anced, when determining the CS-SVM classiﬁer

two parameters are deﬁned, C

and C

−

using

class weights deﬁned by calculating the sample

ratio for each class. This was achieved by dividing

the total number of classiﬁer samples with the in-

dividual class samples. In addition, class weights

were multiplied by a factor of 100 to achieve satis-

factory weight ratios (Nagi and Mohamad, 2010).

5. Classify the samples from the training set B

and compare the results with the labels predeter-

mined. From these comparison, obtain the esti-

mated F

value

for C

and γ

called F

value

,γ

6. Repeat these procedure for B

= B

and the

combination of the reaming sets as B

getting

,γ

), then for B

= B

and so on until com-

pleting the p iterations.

7. For each pair of (C

,γ

) there’s an estimation of

the classiﬁcation error for each cross validation.

The classiﬁcation error for this pair (C

,γ

) is the

average value of the classiﬁcation errors obtained

in each cross validation, e(C

,γ

) =

∑

,γ

8. This method is repeated combining all the values

from the sets C and γ.

9. The values of C

opt

and γ

opt

are the ones for which

the smallest classiﬁcation error is obtained.

The metric used for measuring the classiﬁcation

error for this method was the F

value

. For One-Class

SVM, the method was the same but with the main ob-

jective of ﬁnding σ ∈ S = { σ

,σ

.....σ

3.2.2 OPF

In (Ramos et al., 2010) a new approach, Optimum

Path Forest (OPF), is applied to fraud detection in

electricity consumption. The work shows good re-

sults in a problem similar to the targeted. OPF creates

a graph with training dataset elements. A cost is as-

sociated to each path between two elements, based on

the distance of the intermediate elements belonging

to the path. It is assumed, that elements of the same

class will have a lower path cost, than elements of dif-

ferent classes. The next step is to choose representa-

tives from each class, called prototypes. Classifying a

new element implies to ﬁnd the prototype with lowest

path cost. Since OPF is very sensitive to class im-

balance, we under-sampled the majority class. Best

performance was obtained while using a training data

set with 40% of the elements from the minority class.

3.2.3 C4.5

The fourth classiﬁer used is a decision tree proposed

by Ross Quinlan: C4.5. Trees are a method widely

used in pattern recognition problems due to its sim-

plicity and good results. To classify, a sequence of

simple questions is done. It begins with an initial

question, and depending on the answer, the procedure

continues until reaching a conclusion about the label

to be assigned. The disadvantage of these methods is

that they are very unstable and highly dependent on

the training set. To ﬁx this, in C4.5 a later stage of

AdaBoost was implemented. It generates multiple in-

stances of the tree with different portions of the train-

ing set and then combines them achieving a more ro-

bust result. As in OPF, sensitivity to class imbalance

has led to sub-sampling the majority class. Again, we

found that the best results was obtained while using

a training data set with 40% of the elements from the

minority class.

3.3 Combining Classiﬁers

The next step after selecting feature sets and adjust-

ing classiﬁcation algorithms to the training set, is to

decide how to combine the information provided by

each classiﬁer. There are several reasons to combine

classiﬁers, for example, to obtain a more robust and

general solution and improve the ﬁnal performance

(Dietterich, 2000).

After labels have been assigned by each individual

classiﬁer, a decision rule is build as:

(x) = λ

O−SVM

+ λ

CS−SVM

+λ

OPF

+ λ

Tree

(5)

(x) = λ

O−SVM

+ λ

CS−SVM

+λ

OPF

+ λ

Tree

(6)

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

138

where d

(x) = 1 if the classiﬁer j labels the sample as

i and 0 otherwise. Then if g

(x) > g

(x) the sample

is assigned to the positive class, if g

(x) > g

(x) the

sample is assigned to the negative class.

In (Kuncheva, 2004), the weighted majority vote

rule is analyzed and optimum weights are found for

maximum overall accuracy, assuming independence

between classiﬁers: λ

= log



Accuracy

1−Accuracy



, where

Accuracy

represents the ratio of correctly classiﬁed

samples for the classiﬁer j, (in (Kuncheva, 2004) pri-

ors are also consider on the g

{p,n}

(x) construction

adding log(P(ω

{p,n}

)))

Inspired in this result, but taking into account that

we want to ﬁnd a solution with good balance between

Recall and Precision, several weights λ

p,n

were pro-

posed:

• λ

= log



Recall

−1



• λ

= log



value

−1



• λ

= log



Accuracy

1−Accuracy



• λ

= Recall

and λ

= Recall

Also the optimal multipliers were found by ex-

haustive search over a predeﬁned grid, looking

for those which maximize the classiﬁcation F

value

Search was made by looking for all the possibilities

with λ

∈ [0 : 0.05 : 1] and was evaluated with a 10-

fold cross validation.

All of the proposed combined classiﬁers improved

individual classiﬁers performance. In Table 2 we

present the performance results using optimal multi-

pliers, found by exhaustive search.

4 RESULTS

4.1 Data

For this paper we used a data set of 1504 indus-

trial proﬁles (October 2004- September 2009) ob-

tained from the Uruguayan electric power company

(DATASET 1). Each proﬁle is represented by the cus-

tomers monthly consumption. UTE technicians make

random proﬁle selection and data labeling. Train-

ing and performance evaluation shown in Table 2 was

done with DATASET 1. Another independent dataset

(DATASET 2) of 3338 industrial proﬁles with con-

temporary data (January 2008-2011) was used for on

ﬁeld evaluation.

4.2 Labeling Results

Table 2 shows performance for individual classiﬁers

and for the combination of them, results shown here

were achieved by using a 10-fold cross validation us-

ing DATASET1. CS-SVM presented the best F

value

followed by One class SVM. We saw that combina-

tion improved performance achieving better results

than those of the the best individual classiﬁer.

Table 2: Data Set 1 labeling results.

Description Acc. Rec

. Pre. Fval.

(%) (%) (%) (%)[β = 1]

O-SVM 84,9 54,9 50,8 52,8

CS-SVM 84,5 62,8 49,7 55,5

OPF 80,1 62,2 40,5 49

Tree (C4.5) 79 64,6 39 48,6

Combination 86,2 64 54,4 58,8

4.3 On Field Results

After all the proposed alternatives were evaluated (on

DATASET 1), comparing automatic labelling with

manual labelling performed by UTE’s experts, we

tested data labels with on ﬁeld evaluation.

This test were done in the following way:

1. Train the classiﬁcation algorithm using

DATASET 1.

2. Classify samples from DATASET 2. Lets call

DATASET 2P the samples of DATASET 2 la-

belled as positive (associated to abnormal con-

sumption behaviour).

3. Inspect customers on DATASET 2P

560 samples of DATASET 2 were labelled as pos-

itive, from those, 340 were randomly selected (due to

human resource issues) to perform inspections. The

inspections yielded 11 irregular situations and 4 sus-

pect situations (being analyzed). This results show

that the automatic framework has a hit rate of be-

tween 3.3% and 4.4%. Manual fraud detection per-

formed by UTE’s experts during 2010 had a hit rate

of about 4%, so results are promising. Specially tak-

ing into account that manual detection considers more

information than just the consumption curve, such

as fraud history, surface dimension and contracted

power, among others.

Figures 2, 3 and 4 show some examples of cus-

tomers classiﬁed as suspicious by our automatic sys-

tem. Once inspected, illegal activities were detected

in these cases.

IMPROVING ELECTRIC FRAUD DETECTION USING CLASS IMBALANCE STRATEGIES

139

Figure 2

Figure 3

Figure 4

5 CONCLUSIONS

We developed a framework able to detect customers

whose consumption behaviour show some kind of ir-

regularities. UTE is beginning to incorporate the sys-

tem proposed and ﬁrst results showed that it is use-

ful and can lead to important savings, both time and

money. We will continue working with UTE’s collab-

oration, focusing our investigation on the lines of:

• Improving ﬁnal performance and monitor bigger

customer sets aiming to reach all customers in

Montevideo (Uruguayan capital city).

• Analyze existence of data clusters.

• Add more features to our learning algorithm, such

as: counter type (digital or analog), customer type

(dwelling or industrial) and contracted power,

among others.

We introduce different classiﬁers suitable for this

type of problems (with unbalanced classes), compar-

ing performance results for each of them. Innovative

combination strategies are also proposed, all of them

showing better results (using F-value as performance

measurement) than the best individual classiﬁer.

ACKNOWLEDGEMENTS

The authors would like to thank UTE, especially Juan

Pablo Kosut, for providing datasets and share fraud

detection expertise. We also want to thank Pablo

Muse, Pablo Cancela and Martin Rocamora for useful

advice.

REFERENCES

Alcetegaray, D. and Kosut, J. (2008). One class svm para

la detecci´on de fraudes en el uso de energ´ıa el´ectrica.

Trabajo Final Curso de Reconocimiento de Patrones,

Dictado por el IIE- Facultad de Ingenier´ıa- UdelaR.

Barandela, R. and Garcia, V. (2003). Strategies for learn-

ing in class imbalance problems. Pattern Recognition,

pages 849–851.

Batista, G., Pratti, R., and Monard, M. (2004). A study of

the behavior of several methods for balancing machine

learning training data. SIGKDD Explorations 6, pages

20–29.

Brown, G. and Kuncheva, L. (2010). ”good” and ”bad” di-

versity in majority vote ensembles. In Multiple Clas-

siﬁer Systems. Springer Berlin Heidelberg.

Chang, C. and Lin, C. (2001). LIBSVM: a library for sup-

port vector machines.

Chawla, N., Bowyer, K., and Hall, L. (2002). Smote: syn-

thetic minority over-sampling technique. Journal of

Artiﬁcial Intelligence Research.

Chawla, N., Lazarevic, A., and Hall, L. (2003). Smote-

boost: impoving prediction of the minority class in

boosting. European Conf. ok Principles and Practice

of Knowledge Discovery in Databases.

Chawla, N. and Sylvester, J. (2007). Exploiting diversity

in ensembles: Improving the performance on unbal-

anced datasets. Departament of Computer Science

and Engineering.

Dash, M. and Liu, H. (1997). Feature selection for classiﬁ-

cation. Intelligent Data Analysis, 1:131–156.

Dietterich, T. (2000). Ensemble methods in machine learn-

ing. Multiple Classiﬁer Systems, volume 1857 of Lec-

ture Notes in Computer Science.

Duin, R. (2000). PRTools Version 3.0: A Matlab Toolbox

for Pattern Recognition.

Garcia, V., Sanchez, J., Mollineda, R., Alejo, R., and So-

toca, J. (2007). The class imbalance problem in pat-

tern classiﬁcation and learning. In Congreso Espaol

de Informtica, Spain.

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

140

Guo, X. and Zhou, G. (2008). On the class imbalance prob-

lem. IIE - Computer Society, 1:192.

Jiang, R., Tagaris, H., and Laschusz, A. (2000). Wavelets

based feature extraction and multiple cassiﬁers for

electricity fraud detection.

Kolez, A., Chowdhury, A., and Alspector, J. (2003). Data

duplication: an imbalance problem? Proc. Proc. Intl.

Conf. on Machine Learning, Workshop on Learning

with Imbalanced Data Sets II.

Kuncheva, L. (2004). Combining Pattern Classiﬁers: Meth-

ods and Algorithms. Wiley-Interscience.

Manning, C., Raghavan, P., and Schutze, H. (2009). An

Introduction to Information Retrival. Cambridge Uni-

versity Press, Cambridge, England, 1 edition.

Muniz, C., Vellasco, M., Tanscheit, R., and Figueiredo, K.

(2009). Ifsa-eusﬂat 2009 a neuro-fuzzy system for

fraud detection in electricity distribution.

Nagi, J. and Mohamad, M. (2010). Nontechnical loss de-

tection for metered customers in power utility using

support vector machines. IEEE TRANSACTIONS ON

POWER DELIVERY, VOL. 25, NO. 2.

Papa, J. and Falcao, A. (2010). Optimum-path forest: A

novel and powerful framework for supervised graph-

based pattern recognition techniques. Institute of

Computing University of Campinas.

Papa, J., Falcao, A., and C.Suzuki (2008). LibOPF: a li-

brary for Opthimum Path Forets.

Papa, J., Falcao, A., Miranda, P., Suzuki, C., and Mas-

carenhas, N. (2007). Design of robust pattern clas-

siﬁers based on optimum-path forests. 8th Interna-

tional Symposium on Mathematical Morphology Rio

de Janeiro Brazil Oct, pages 337–348.

Ramos, C., de Sousa, A. N., Papa, J., and Falcao, A.

(2010). A new approach for nontechnical losses de-

tection based on optimum-path forest. IEEE TRANS-

ACTIONS ON POWER SYSTEMS.

Scholkopf, B. and Smola, A. (2002). Learning with Kernels.

The MIT Press, London, 2. edition.

Vapnik, V. (1998). Statistical Learning Theory. New York:

Wiley.

Wang, S. and Yao, X. (2009). Theoretical study of the rela-

tionship between diversity and single-class measures

for class imbalance learning.

IMPROVING ELECTRIC FRAUD DETECTION USING CLASS IMBALANCE STRATEGIES

141