Unbalanced Data Classiﬁcation in Fraud Detection

by Introducing a Multidimensional Space Analysis

Roberto Saia

Department of Mathematics and Computer Science

University of Cagliari, Via Ospedale 72 - 09124 Cagliari, Italy

Keywords:

Business Intelligence, Fraud Detection, Data Imbalance, Pattern Mining, Metrics.

Abstract:

The problem of frauds is becoming increasingly important in this E-commerce age, where an enormous num-

ber of ﬁnancial transactions are carried out by using electronic instruments of payment such as credit cards.

In this scenario it is not possible to adopt human-driven solutions due to the huge number of involved opera-

tions. The only approach is therefore to adopt automatic solutions able to discern the legitimate transactions

from the fraudulent ones. For this reason, today the development of techniques capable of carrying out this

task efﬁciently represents a very active research ﬁeld that involves a large number of researchers around the

world. Unfortunately, this is not an easy task, since the deﬁnition of effective fraud detection approaches is

made difﬁcult by a series of well-known problems, the most important of them being the non-balanced class

distribution of data that leads towards a signiﬁcant reduction of the machine learning approaches performance.

Such limitation is addressed by the approach proposed in this paper, which exploits three different metrics of

similarity in order to deﬁne a three-dimensional space of evaluation. Its main objective is a better charac-

terization of the ﬁnancial transactions in terms of the two possible target classes (legitimate or fraudulent),

facing the information asymmetry that gives rise to the problem previously exposed. A series of experiments

conducted by using real-world data with different size and imbalance level, demonstrate the effectiveness of

the proposed approach with regard to the state-of-the-art solutions.

1 INTRODUCTION

Many studies, such as those conducted by the

Euromonitor International

, indicate that the E-

commerce growth attracts fraudsters, as shown in Fig-

ure 1 that reports the total fraud levels in the Europe,

Middle East and Africa (EMEA) areas.

Considering the economic relevance of the frauds

events, is more and more crucial the research of ef-

fective fraud detection approaches able to face this

problem, reducing the economic losses as much as

possible. Unfortunately, the development of these ap-

proaches has to face some problems, the most impor-

tant of which is represented by the non-balanced dis-

tribution of data Japkowicz and Stephen (2002) that

characterizes the information usually available for the

deﬁnition of fraud detection models.

Other additional problems, such as the data

scarcity Assis et al. (2010); Ahmed et al. (2016), the

non-adaptability of the detection models Sorournejad

et al. (2016), the data heterogeneity Chatterjee and

http://www.euromonitor.com/

Segev (1991); Che et al. (2013), or the cold start Zhu

et al. (2008); Donmez et al. (2007) issue, contribute

to making the development of such approaches more

difﬁcult.

The literature offers us a number of techniques

aimed to detect the fraudulent transactions in a ﬁ-

nancial data ﬂow. Some examples are those based

on the: Data Mining techniques to generate rules

on the basis of fraud patterns Lek et al. (2001); Ar-

tiﬁcial Intelligence techniques to detect anomalies

in the data Hoffman and Tessendorf (2005); Neu-

ral Networks techniques to design predictive mod-

els Gopinathan et al. (1998); Signature-based tech-

niques able to model the legitimate data Edge and

Sampaio (2009); Fuzzy Logic techniques that ex-

ploit the fuzzy analysis to perform fraud detection

tasks Lenard and Alam (2005); Decision Tree tech-

niques aimed to reduce the misclassiﬁcations Sahin

et al. (2013); Machine Learning techniques able to

generate predictions on the basis of multiple mod-

els Whiting et al. (2012); Zhang et al. (2011); Genetic

Programming techniques that exploit an Evolutionary

Saia, R.

Unbalanced Data Classiﬁcation in Fraud Detection by Introducing a Multidimensional Space Analysis.

DOI: 10.5220/0006663000290040

In Proceedings of the 3rd International Conference on Internet of Things, Big Data and Security (IoTBDS 2018), pages 29-40

ISBN: 978-989-758-296-7

2006 2008 2010 2012 2014 2016

1,250

1,500

1,750

Years

Millions o f euros

Figure 1: Total Fraud Level in EMEA.

Computation approach in order to detect frauds As-

sis et al. (2010); Statistical Inference techniques able

to detect frauds by adopting a Bayesian model Hooi

et al. (2016).

One of the limit shared by all these techniques is

the strategy they adopt to deﬁne an evaluation model,

which is usually based on an unique criterion applied

on the previous transactions collected by the fraud

detection systems. Such a way of proceeding leads

towards misclassiﬁcations, considering that the avail-

able data usually does not contain enough information

about all the transaction classes, due to the high level

of imbalance that characterizes them.

In several previous works Saia et al. (2015);

Saia and Carta (2017a); Saia (2017); Saia and Carta

(2017b) we studied the advantages and disadvantages

related to the adoption of proactive fraud detection

approaches as possible solution to mitigate the afore-

mentioned problems.

The main intuition on which this paper relies is

to perform the data analysis in a three-dimensional

space, which is given by three different metrics of

similarity. The objective we want to achieve is a better

characterization (with respect to the state-of-the-art

approaches) of each transaction in one of the two pos-

sible classes of destination (i.e., legitimate or fraudu-

lent).

The scientiﬁc contributions given by this paper are

as follows:

(i) formalization of three similarity metrics aimed

to compare different aspects of two transactions,

i.e., Transactions Global Similarity, Features

Local Similarity, and Features Global Similar-

ity metrics;

(ii) deﬁnition of a three-dimensional space given by

the aforementioned three metrics of similarity,

which allows us to well characterize each trans-

action with respect to the other ones;

(iii) formulation of an algorithm able to classify each

new transaction as legitimate or fraudulent by

performing its evaluation in the previously de-

ﬁned three-dimensional space.

The paper is organized into the following sec-

tions: Section 2 introduces the background and re-

lated work; Section 3 provides a formal notation

and deﬁnes the faced problem; Section 4 describes

the proposed approach implementation; Section 5

gives details on the experimental environment, on the

adopted datasets and metrics, as well as on the used

strategy and the competitor approach, concluding by

discuss the experimental results; Section 6 provides

some concluding remarks and points to some further

directions for research.

2 BACKGROUND AND RELATED

WORK

This section introduces the fraud detection scenario

by starting with the description of strategies and ap-

proaches commonly used in this ﬁeld, together with

the most important open problems. It continues by

exposing the idea that stands behind the proposed ap-

proach, concluding with a description of the state-of-

the-art competitor used to evaluate its performance.

2.1 Strategies and Approaches

A fraud detection system can operate by using two

different strategies Phua et al. (2010), supervised or

unsupervised:

• in the case of the supervised strategy, it takes into

account all the previous transactions (i.e., legiti-

mate and fraudulent) in the process of deﬁnition

of the evaluation model. Such strategy needs a

number of examples related to both the legitimate

and the fraudulent cases, and its capability is lim-

ited by the detection of patterns that were present

in the data used to train the evaluation model;

• the unsupervised strategy instead operates by

comparing the values of the features that com-

pose the transaction to evaluate to those present

in the legitimate cases previously collected by the

system. This strategy is often ineffective since

many fraudulent transactions do not have signiﬁ-

cant variations in their feature values, with regard

to the legitimate ones. For this reason the develop-

ment of fraud detection approaches based on the

unsupervised strategy is not an easy task Gold-

stein and Uchida (2016).

Regardless of the adopted strategy, a fraud detec-

tion system can instead follow a static, updating, or

forgetting operative approach:

• the static approach Pozzolo et al. (2014) operates

by dividing the data into blocks of equal size and

IoTBDS 2018 - 3rd International Conference on Internet of Things, Big Data and Security

the evaluation model is deﬁned by taking into ac-

count a certain number of initial and contiguous

blocks;

• the updating approach Wang et al. (2003) oper-

ates by updating the evaluation model at each new

block by using a deﬁned number of latest and con-

tiguous blocks;

• the forgetting approach Gao et al. (2007) oper-

ates by updating the evaluation model when a new

block appears, by taking into account the legiti-

mate transactions in the last two blocks and all the

fraudulent transactions present in all the blocks.

The evaluation models deﬁned by adopting the

aforementioned operative approaches can be used as

they are or they can be joined together in order to de-

ﬁne a more complex evaluation model.

However, all the approaches lead toward several

issues, because the static approach is ineffective in

the modelization of the users behavior, the updat-

ing approach is ineffective when working with small

amounts of data, and the forgetting approach is char-

acterized by an excessive computational complexity.

2.2 Open Problems

This section reports the most common problems re-

lated to the fraud detection processes.

2.2.1 Data Scarcity

Frauds represent the biggest problem that affects the

E-commerce area, a problem worsened by the scarcity

of real-world datasets available for the research com-

munity Assis et al. (2010); Ahmed et al. (2016),

which are essential for the development of new fraud

detection techniques. This is a well-known problem

related to the restrictive policies commonly adopted

by those working in this ﬁeld, ﬁnancial operators that

for competitive or legal reasons do not want to release

information about their business and, above all, about

the frauds that they have suffered. It should be added

that such information is not even released in anony-

mous form, since even in this form they may reveal

potential vulnerabilities.

2.2.2 Model Non-adaptability

Another problem, which affect both the supervised

and unsupervised approaches, is related to the non-

adaptability of the detection models. This means that

the evaluation models do not lead toward good per-

formance when the transactions to evaluate are char-

acterized by unknown patterns (with regard to those

used to deﬁne the evaluation model) Sorournejad et al.

(2016).

2.2.3 Data Heterogeneity

The data heterogeneity problem is formally deﬁned as

the incompatibility between similar features resulting

in the same data being represented differently in dif-

ferent datasets Chatterjee and Segev (1991); Che et al.

(2013), as it happens in the data involved in the fraud

detection processes.

2.2.4 Data Imbalance

Although the problems outlined above are also im-

portant, the crucial problem that has to be faced in

this ﬁeld is the data imbalance. It is given by the

composition of the data available for the evaluation

model training, which is usually characterized by a

small number of fraudulent transactions and a large

number of legitimate ones.

This adversely affects the performance of the

canonical approaches of classiﬁcation Japkowicz and

Stephen (2002); Brown and Mues (2012); He and

Garcia (2009), where such problem is usually faced

by performing a preliminary balance of data Vinciotti

and Hand (2003). It is performed by duplicating some

of the transactions that belong to the less numerous

class (over-sampling strategy) or by removing some

of the transactions that belong to the more numerous

class (under-sampling strategy). The effectiveness of

these balancing strategies is analyzed and discussed

in Marqu

es et al. (2013); Crone and Finlay (2012).

2.2.5 Cold-start

Another problem directly related to the data imbal-

ance is the cold-start one. It happens when the data

available for the deﬁnition of the evaluation model do

not contain enough information on all the classes of

data. This prevents the deﬁnition of an effective eval-

uation model, since the available information does not

represent all the possible classes of destinations (i.e.,

in our case, legitimate and fraudulent) Attenberg and

Provost (2010).

2.3 Proposed Approach

The proposed Multidimensional Similarity Space

(MSS) approach compares the transactions in a three-

dimensional space given by three different metrics of

similarity. The objective is to achieve a better char-

acterization of each transaction in the context of one

of the two possible classiﬁcations (i.e., legitimate or

fraudulent).

Unbalanced Data Classiﬁcation in Fraud Detection by Introducing a Multidimensional Space Analysis

0.2

0.4

0.6

0.8

1.0

0.2

0.4

0.6

0.8

1.0

0.2

0.4

0.6

0.8

1.0

(0.5, 0.5, 0.5)

Figure 2: Three-dimensional Similarity Space.

Such metrics, described in detail later, allow us to

evaluate different aspects of the transactions, i.e., the

Transactions Global Similarity (T GS), the Features

Local Similarity (T LS), and the Features Global Sim-

ilarity (FLS).

They have been used to deﬁne, respectively, the X,

Y, and Z dimensions of our three-dimensional space,

where the similarity between two transactions is rep-

resented as a point placed at the X, Y, and Z coordi-

nates. This is shown in Figure 2, where the multi-

dimensional similarity between two transactions has

generated a point at the (X=0.5, Y=0.5, Z=0.5) coor-

dinates.

2.4 Competitor Approach

The state-of-the-art competitor we chose to evaluate

the performance of the proposed approach is Ran-

dom Forests Breiman (2001), since it outperforms the

other ones Brown and Mues (2012); Bhattacharyya

et al. (2011) in the fraud detection ﬁeld, as indeed ex-

perimentally veriﬁed in Section Competitor).

Brieﬂy, it works by growing many classiﬁcation

trees, classifying a new transaction (in terms of vec-

tor of its features) by putting it at the bottom of each

one of the trees in the forest. Each tree provides a

classiﬁcation (votes) and the ﬁnal classiﬁcation of the

transaction is given by the classiﬁcation having the

most votes in the context of all the trees in the forest.

3 PRELIMINARIES

This section formalizes the notation used in this paper

and the problem faced by our approach.

3.1 Formal Notation

Given a set of classiﬁed transactions T =

, . . . ,t

}, we denote as T

the subset of le-

gitimate ones (then T

⊆ T ), and as T

−

the subset of

fraudulent ones (then T

−

⊆ T ).

Each transaction t ∈ T is composed by a set of

features V = {v

, v

, . . . , v

} and each transaction

can belong only to one class c ∈ C, where C =

{legitimate, f raudulent}.

We also denote a set of unclassiﬁed transactions

T = {

, . . . ,

The aforementioned notation is for convenience

summarized in Table 1.

Table 1: Formal Notation.

Notation Description

T = {t

, . . . , t

} Set of classiﬁed transactions

, with T

⊆ T Subset of legitimate transactions

−

, with T

−

⊆ T Subset of fraudulent transactions

V = {v

, v

, . . . , v

} Set of transaction features

T = {

, . . . ,

} Set of unclassiﬁed transactions

C = {legitimate, f raudulent} Set of possible classiﬁcations

3.2 Problem Deﬁnition

Initially, we denote as Φ the process of classiﬁcation

made by our approach, which is aimed to classify an

unevaluated transaction

t ∈

T as legitimate or fraudu-

lent.

Subsequently, we deﬁne a function

Classi f icator(

t, Φ) that returns a boolean value

β that indicates the correctness of the performed

classiﬁcation made by Φ for the transaction

(0=misclassiﬁcation, 1=correct classiﬁcation).

Finally, we formalize our problem as maximiza-

tion of the sum of the values returned by the

Classi f icator function, as shown in Equation 1.

max

0≤β≤|

T |

β =

T |

∑

u=1

Classi f icator(

, Φ) (1)

4 PROPOSED APPROACH

Our approach has been implemented by following the

three steps summarized below and detailed later:

1. Metrics Deﬁnition: deﬁnition of three metrics

aimed to compare two transactions in terms of dif-

ferent similarity aspects, after we deﬁne the nature

of the data to be evaluated;

IoTBDS 2018 - 3rd International Conference on Internet of Things, Big Data and Security

2. Criteria Formalization: formalization of crite-

ria used to evaluate a new transaction in a three-

dimensional space given by the three metrics of

similarity previously deﬁned;

3. Algorithm Formulation: formulation of an al-

gorithm based on our Multidimensional Similarity

Space (MSS) approach, able to classify each new

transaction as legitimate or fraudulent.

4.1 Metrics Deﬁnition

This section starts by deﬁning the nature of the data

vectors taken into account during the evaluation pro-

cess, continuing by formalizing the three metrics in-

volved in such process. As introduced in Section 2.3,

these metrics give rise to the three-dimensional space

used to evaluated the similarity between two transac-

tions, as shown in Figure 2. They represent, respec-

tively, the X, Y, and Z dimensions of this space (i.e.,

X=TGS, Y=FLS, and Z=FGS).

4.1.1 Data Vectors

Equation 2 shows the matrix given by a series of trans-

actions, which in this case are those in the set T (i.e.,

then |T | = N). With regard to the ﬁrst transaction,

we highlighted the vector of data (i.e., values in the

set V ) that will represent it in our Multidimensional

Similarity Space.

T =







1,1

1,2

. .. v

1,M

2,1

2,2

. .. v

2,M

N,1

N,2

. .. v

N,M







(2)

4.1.2 Transactions Global Similarity Metric

The ﬁrst metric used in our approach is the Transac-

tions Global Similarity (T GS). It is not a novel metric,

since it coincides with the well known cosine similar-

ity metric, which is used to measure the global simi-

larity between two transaction vectors V

and V

(with

size larger than zero). More formally, given two trans-

action vectors V

and V

, it is calculated as shown in

the Equation 3. We normalized the result in a range

[0, 1], where 0 indicates two completely different vec-

tors and 1 two equal vectors.

T GS(V

) =

·V

k·kV

(3)

4.1.3 Features Local Similarity Metric

The Features Local Similarity (FLS) metric has been

designed in the context of the proposed approach in

order to measure the similarity between transactions

in terms of the weighted sequence of their features.

It relies on the consideration that similar transactions

are characterized by a similar weighted sequences of

features. This means that, if we sort their features on

the basis of their values, the obtained sequences of

their original indexes will be similar in terms of T GS

metric (i.e., cosine similarity). More formally, given

two transactions t

(1)

and t

(2)

we calculate the FLS

as shown in Equation 4, where V

(1)

and V

(2)

are the

transaction vectors to compare and the idx function re-

turns the sorted V in terms of former element indexes

(i.e., the indexes of the V elements before sorting).

FLS(V

(1)

(2)

) = T GS



idx



(1)



, idx



(2)



with

V = {|v

| ≤ |v

| ≤ . . . ≤ |v

(4)

4.1.4 Features Global Similarity Metric

The Features Global Similarity (FGS) is another met-

ric deﬁned in the context of the proposed approach.

Its aim is the evaluation of the global difference be-

tween two transactions in terms of their feature val-

ues, measured between corresponding features of the

two transactions. It operates by following the same

criterion of the RMSE

metric, but in our metric the

obtained result has been normalized in a range [0, 1].

More formally, given two transactions

(1)

and t

(2)

, the

FGS is calculated by considering the corresponding

vectors V

(1)

and V

(2)

, as shown in Equation 5, where

max(RMSE) is the maximum value assumed by RMSE in

the context of all the comparisons between V

(1)

and

all other vectors corresponding to all the transactions

in the set T .

FGS(V

(1)

(2)

) = 1 −

RMSE

max(RMSE)

with

RMSE =

∑

m=1



(1)

− v

(2)



(5)

4.2 Criteria Formalization

A new transaction

t ∈

T is classiﬁed as legitimate or

fraudulent on the basis of a comparison process be-

tween it and all the transactions in the set T. Such

process is performed by using a threefold criterion of

similarity evaluation based on the three metrics previ-

ously described in Section 4.1 and a r value exper-

imentally deﬁned in Section 5.4.2. More in detail,

Root Mean Squared Error

Unbalanced Data Classiﬁcation in Fraud Detection by Introducing a Multidimensional Space Analysis

x (TGS)

y (FLS)

z (FGS)

legitimate cases fraudulent cases

Figure 3: Evaluation Space.

each new transaction

t ∈

T is classiﬁed on the basis

of the following three criteria:

(i) we deﬁne a center c in our three-dimensional

space, as shown in Figure 3, by using as coor-

dinates X, Y, and Z, respectively, max(T GS) −r,

max(FLS) − r, and max(FGS) − r, all of them

calculated between the transaction

t to evaluate

and all the transactions in the set T ;

(ii) the classiﬁcation of the transaction

t depends on

the nature (legitimate or fraudulent) of the trans-

actions in the set T bounded by the sphere of

radius r and center c;

(iii) a new transaction is classiﬁed as legitimate if the

number of legitimate transactions in T

bounded

by this sphere is greater than that of the fraud-

ulent ones in T

−

, otherwise it is classiﬁed as

fraudulent.

By way of example, Figure 3 shows a case where

the evaluated transaction

t has been classiﬁed as

fraudulent, because the number of fraudulent trans-

actions in T

−

bounded by the sphere of radius r and

center c is greater than that of the legitimate ones in

It should be noted that such evaluation process

adopts a prudential criterion, since the cases with

equal number of legitimate and fraudulent transac-

tions bounded by the sphere lead toward a classiﬁ-

cation of the

t transaction as fraudulent.

4.3 Algorithm Formulation

The classiﬁcation Algorithm 1 takes as input the set

of previous transactions T , an unevaluated transac-

tion

t ∈

T , and the radius value r. It returns as

output a result value that provides the classiﬁcation

given to the transaction

i (i.e., a boolean value, with

true=legitimate and false=fraudulent).

It should be observed that when we refer to trans-

actions in the

T and T sets, we refer to their respective

vectors composed by the values of the features (i.e.,

set V ).

Algorithm 1: Transaction classiﬁcation.

Input: T =Previous transactions,

t=Unevaluated transaction, r=Radius

Output: result=Transaction

t classiﬁcation

1: procedure CLASSIFICATION(T ,

t, r)

2: cx ← (getMaxTGS(T,

t) −r)

3: cy ← (getMaxF LS(T,

t) −r)

4: cz ← (getMaxF GS(T,

t) −r)

5: for each t in T do

6: s1 ← getT GS(

t, t)

7: s2 ← getFLS(

t, t)

8: s3 ← getFGS(

t, t)

9: if (s1 ≥ (cx −r) ∧ s1 ≤ (cx + r)) ∧

10: (s2 ≥ (cy −r) ∧ s2 ≤ (cy + r)) ∧

11: (s3 ≥ (cz −r) ∧ s3 ≤ (cz + r)) then

12: if getClass(t) == legitimate then

13: lclass ← lclass + 1

14: else

15: f class ← f class +1

16: end if

17: end if

18: end for

19: if lclass > f class then

20: result ← true

21: else

22: result ← f alse

23: end if

24: return result

25: end procedure

The classiﬁcation process is performed through

the Algorithm 1. It starts by calculating the max value

of the TGS, FLS, and FGT, between the transaction

t and those of all the transactions in T, deﬁning the

cx, cy, and cz centers to use in our evaluation process

(steps from 2 to 4).

From step 5 to 18 it calculates TGS, FLS, and FGT

between each transaction t ∈ T and the transaction

under evaluation.

If the obtained values are, respectively, within the

cx ± r, cy ± r, and cz ± r bounds, in the steps from

12 to 16 it increases the lclass (if the instance t is

classiﬁed as legitimate) or the fclass (if the instance t

is classiﬁed as fraudulent) by one unit.

At the end of the previous process, in the steps

from 19 to 23 the transaction

i is classiﬁed as legiti-

mate (true value) if the value of lclass is greater than

fclass, otherwise the transaction is classiﬁed as fraud-

ulent (false value).

The algorithm returns the classiﬁcation at the step

24 through the boolean value result.

IoTBDS 2018 - 3rd International Conference on Internet of Things, Big Data and Security

5 EXPERIMENTS

This section provides information on the development

environment, on the adopted real-world dataset, as

well as on the evaluation metrics, the followed strat-

egy, and the state-of-the-art approach used as com-

petitor, reporting and discussing the experimental re-

sults at the end.

5.1 Environment

Our approach has been developed in Java by us-

ing the Waikato Environment for Knowledge Analysis

(W EKA)

library to implement the competitor state-

of-the-art approaches.

5.2 DataSet

This section describes the real-world dataset used for

the experiments, together with the criteria used to per-

form this operation.

5.2.1 Description

The adopted dataset is composed by a series of credit

card transactions made by European cardholders

It contains the transactions made in two days of

September 2013, i.e., 492 fraudulent transactions and

284,807 legitimate ones, and it represents an highly

unbalanced dataset Pozzolo et al. (2015), considering

that the fraudulent transactions are only the 0.0017%

of the total.

All dataset features are provided in an anonymous

form for privacy reasons, except the Amount and Time

ones. The ﬁrst one indicates the total amount of

the transaction, while the second one the number of

seconds elapsed between it and the ﬁrst transaction

stored in the dataset. We chose not to use the Time

information in order to operate without any reference

to the original transaction sequence.

5.2.2 Criteria

By keeping the number of fraudulent transactions

ﬁxed (i.e., all of them), we create several subsets with

10000, 20000, . . . ,240000 legitimate transactions, in

order to reproduce several real-world scenarios with

different levels of data imbalance. Each dataset

has been randomly shufﬂed before its use and all

the experiments have been performed by following

the k-fold cross-validation criterion described in Sec-

tion 5.4.

http://www.cs.waikato.ac.nz/ml/weka/

https://www.kaggle.com/dalpozz/creditcardfraud

The characteristics of each dataset are reported

in Table 2, where the size indicates the number

of legitimate transactions and the data imbalance

is expressed in terms of percentage of fraudulent

transactions.

Table 2: Datasets.

Dataset Fraudulent Dataset Fraudulent Dataset Fraudulent

size cases (%) size cases (%) size cases (%)

10K 0.04920 90K 0.00547 170K 0.00289

20K 0.02460 100K 0.00492 180K 0.00273

30K 0.01640 110K 0.00447 190K 0.00259

40K 0.01230 120K 0.00410 200K 0.00246

50K 0.00984 130K 0.00378 210K 0.00234

60K 0.00820 140K 0.00351 220K 0.00224

70K 0.00703 150K 0.00328 230K 0.00214

80K 0.00615 160K 0.00289 240K 0.00205

5.3 Metrics

This section introduces and explains the metrics

adopted to evaluate the performance of our approach

and that of its competitor.

5.3.1 Speciﬁcity

The Speciﬁcity metric, also known as True Negative

Rate (T NR), is mainly driven by the number of trans-

actions correctly classiﬁed as fraudulent. More for-

mally, it is calculated as shown in Equation 6, where

T , T N, and FP are, respectively, the set of new

transactions to classify, the number of transactions

correctly classiﬁed as fraudulent, and the number of

fraudulent transactions erroneously classiﬁed as legit-

imate).

Speci f icity(

T ) =

T N

(T N + FP)

(6)

5.3.2 F-score

The F-score metric represents the weighted average

of the precision and recall metrics. It is largely used to

evaluate the binary classiﬁers performance when they

work with unbalanced datasets Pozzolo et al. (2015).

Its result is in the range [0, 1], where 1 denotes the

best performance. More formally, it is calculated as

shown in Equation 7, where the set

contains the

predicted classiﬁcations and the set

contains the

actual classiﬁcations of them.

F-score(

) = 2 ·

(precision(

) ·recall(

))

(precision(

) +recall(

))

with

precision(

) =

∩

, recall(

) =

∩

(7)

Unbalanced Data Classiﬁcation in Fraud Detection by Introducing a Multidimensional Space Analysis

5.3.3 Area Under ROC Curve

The Area Under the Receiver Operating Character-

istic curve (AUC) is a metric used to evaluate the

performance of a classiﬁcation model Powers (2011);

Faraggi and Reiser (2002). More formally, given the

subsets of the previous legitimate transactions T

and

the previous fraudulent ones I

−

, it works as shown in

Equation 8, where Ψ denotes all the possible compar-

isons between the transactions in the subsets T

and

−

. The result is in the range [0, 1] (where 1 indicates

the best performance) and it is obtained by the aver-

age of all these comparisons.

Ψ(i

, i

−

) =











1, i f i

> i

−

0.5, i f i

= i

−

0, i f i

< i

−

AUC =

|·|I

−

∑

−

∑

Ψ(i

, i

−

) (8)

5.4 Strategy

This section gives some details about the criterion

adopted to evaluate our approach, deﬁning also the

optimal value of the sphere radius r .

5.4.1 Cross-validation

All the performed experiments have been conducted

by adopting the k-fold cross-validation criterion, with

k=10. The dataset has been divided in k subsets, and

each k subset has been used as test set, while the other

k-1 subsets have been used as training set, considering

as ﬁnal result the average of all the obtained results.

It was made to improve the worth of the obtained

results, since through this criterion we reduce the im-

pact of data dependency. The original dataset has

been divided into k subset by using an R

script, and

the obtained training and test sets have been used to

evaluate both our approach and its competitor RF.

The experimental results have been analyzed by

using the independent-samples two-tailed Student's t-

tests (p < 0.05), in order to verify the existence of a

statistical signiﬁcance between them.

5.4.2 Sphere Radius Deﬁnition

The Algorithm 1 previously formalized in Section 4.3

needs the deﬁnition of the radius r value, since its per-

formance depends on it.

We obtained it by performing a series of experi-

ments where we tested a wide range of possible values

in the context of the set T , by adopting during this op-

eration the k-fold cross-validation criterion described

in Section 5.4.1.

https://www.r-project.org/

The results indicate 0.026 as the optimal value of

r, since it leads towards the best performance in terms

of Speciﬁcity, F-score, and AUC metrics.

5.5 Competitor

We use Random Forest as the competitor approach,

because the literature indicates it as the most per-

forming one for binary classiﬁcation tasks with unbal-

anced data Brown and Mues (2012); Bhattacharyya

et al. (2011). In any case, we have nevertheless car-

ried out a preliminary study by involving ten differ-

ent state-of-the-art approaches designed to perform

binary classiﬁcation tasks, i.e., Naive Bayes, Logit

Boost, Logistic Regression, Stochastic Gradient De-

scent, Multilayer Perceptron, Voted Perceptron, Ran-

dom Tree, K-nearest, Decision Tree, and Random

Forests.

The results shown in Table 3 conﬁrm Random

Forests as the most performing approach in terms

of AUC metric, a metric able to evaluate the overall

performance of the evaluation model Sobehart and

Keenan (2001).

Table 3: Competitors Performance.

Approach AUC Approach AUC

Naive Bayes 0.789 Logit Boost 0.796

Logistic Regression 0.794 SGD 0.747

Multilayer Perceptron 0.792 Voted Perceptron 0.713

Random Tree 0.751 K-nearest 0.764

Decision Tree 0.761 Random Forests 0.799

5.5.1 Parameter Tuning

Despite the fact that Random Forests usually gets

better performance also without a preliminary tuning

process, we preferred to perform this activity in order

to maximize its performance.

Considering that, with respect to the WEKA de-

fault parameters, we get signiﬁcant variations of the

Random Forests performance only by varying the

number of randomly chosen attributes, we tuned only

this parameter.

Such activity involved both the training and test

sets in order to overcome the overﬁtting problem,

adopting the cross-validation criterion previous ex-

posed in Section 5.4.1. The results indicates 8 as the

optimal number of randomly chosen attributes.

IoTBDS 2018 - 3rd International Conference on Internet of Things, Big Data and Security

5.6 Results

The experimental results are presented and discussed

in this section, initially through a brief description and

then with a more in-depth analysis.

5.6.1 Overview

From a ﬁrst analysis of the results reported in Figure 4

arises the following general considerations:

• Figure 4.a shows that, in comparison to its com-

petitor RF, the proposed MSS approach con-

stantly maintains good performance in terms of

Speciﬁcity. This indicates its capability in the de-

tection of the fraudulent transactions, regardless

of the number of transactions involved in the eval-

uation model deﬁnition and the level of data im-

balance;

• Figure 4.b shows that the proposed MSS approach

constantly maintains good performance in terms

of F-score, differently from its competitor RF.

This indicates its capability to reach a good bal-

ance between Precision and Recall performances,

regardless of the size of data and the level of im-

balance of them;

• Figure 4.c shows that also in terms of AUC the

proposed MSS approach reaches and maintains

good performance, with regard to its competitor

RF. This indicates the capability of its evaluation

model to work well with different data conﬁgura-

tions, in terms of their size and level of imbalance.

5.6.2 Discussion

An in-depth analysis of the results introduced in the

previous Section 5.6.1 has given rise to the following

observations:

• the ﬁrst observation is tied to the capability shown

by our MSS approach to keep constant its perfor-

mance, regardless of the size and the level of data

imbalance. This mainly depends on its operative

strategy, which is able to better characterize the

transactions through a multidimensional space of

evaluation less inﬂuenced by the size and the level

of data imbalance;

• the second observation is closely related to the

ﬁrst one, because the MSS constancy in the per-

formance is related to all the metrics taken into

account (i.e., Speciﬁcity, F-score, and AUC). This

represents an additional conﬁrmation of the MSS

capability to better characterize the transactions

in our multidimensional space of evaluation based

on three different similarity metrics;

20 40

80 100 120 140

160

180 200 220 240

0.60

0.70

0.80

0.90

1.00

(a)

Dataset size (×1000)

Speci f icity

MSS

20 40

80 100 120 140

160

180 200 220 240

0.60

0.70

0.80

0.90

1.00

(b)

Dataset size (×1000)

F-score

20 40

80 100 120 140

160

180 200 220 240

0.60

0.70

0.80

0.90

1.00

(c)

Dataset size (×1000)

AUC

Figure 4: Speci f icity, F-score, and AUC Per f ormance.

0.20 0.40

0.60

0.80 1.00

MSS

0.79

0.81

0.89

0.9

0.89

0.91

Value

Approaches

Speciﬁcity

F-score

AUC

Figure 5: Average Per f ormance.

• the results in terms of Speciﬁcity metric, shown

in Figure 4.a, indicate a better capability of the

proposed MSS approach (with respect to its com-

petitor RF) to operate in different real-world sce-

narios. This shows its ability to correctly clas-

sify the new fraudulent transactions, regardless

of the number of instances available to build its

evaluation model and their levels of imbalance. It

should be emphasized how this aspect is crucial in

a real-world scenario, where the capability to de-

tect fraudulent transactions represents the primary

Unbalanced Data Classiﬁcation in Fraud Detection by Introducing a Multidimensional Space Analysis

objective of any fraud detection system;

• the results in terms of F-measure metric, shown in

Figure 4.b, give us an important information about

our MSS approach for what concerns its combined

performance in terms of precision and recall met-

rics. They indicate the MSS capability to prop-

erly classify the new transactions with regard to

both the number of all classiﬁcations made and

the number of them that should have been made;

• another observation is related to the AUC results.

This is a metric able to evaluate the performance

of a binary classiﬁer and the results shown in

Figure 4.c indicate the effectiveness of the MSS

model of evaluation, compared to that of its com-

petitor RF. In fact, it leads towards good and

constant performance that is not inﬂuenced by the

size and degree of imbalance of data;

• the average performance reported in Figure 5

shows how our MSS approach outperform its

competitor RF in terms of all the three metrics

taken into account. It follows that its adoption in

real-world applications can reduce the losses re-

lated to the fraudulent use of credit cards, more

effectively than its state-of-the-art competitors.

6 CONCLUSIONS AND FUTURE

WORK

Nowadays, the fraud detection approaches play a cru-

cial role for many ﬁnancial operators, since they allow

them to reduce the losses related to the fraudulent use

of the electronic instruments of payment, ﬁrst of all

the credit cards.

This occurs because, unlike the past, the enormous

number of ﬁnancial transactions carried out in the E-

commerce area by using such instruments of payment

no longer allows the use of manual approaches based

on the human intervention.

In this context, however, it should be observed

that the development of effective fraud detection ap-

proaches is not a simple task due to several well-

known problems, ﬁrst of all, the data imbalance in the

information available to deﬁne their evaluation mod-

els.

The Multidimensional Similarity Space approach

proposed in this paper faces this problem by analyzing

the transactions in a three-dimensional space, which

is deﬁned in terms of three different metrics of simi-

larity. Its objective is a better characterization of each

transaction in one of the two possible classes of des-

tination (i.e., legitimate or fraudulent).

The experimental results show that our approach

outperforms its state-of-the-art competitor in the con-

text of several real-world scenarios, which reproduce

different size and degree of data imbalance.

Considering that the credit card fraud detection

represents only one of the possible contexts where our

approach can operate, a future work will be oriented

to experiment it in other scenarios characterized by

a high degree of data imbalance. Another interest-

ing future work would be the experimentation of ad-

ditional metrics of similarity in order to improve the

effectiveness of our classiﬁcation approach.

ACKNOWLEDGEMENTS

This research is partially funded by Regione Sardegna

under project Next generation Open Mobile Apps

Development (NOMAD), Pacchetti Integrati di

Agevolazione (PIA) - Industria Artigianato e Servizi -

Annualit

a 2013.

REFERENCES

Ahmed, M., Mahmood, A. N., and Islam, M. R. (2016).

A survey of anomaly detection techniques in ﬁnan-

cial domain. Future Generation Computer Systems,

55:278–288.

Assis, C., Pereira, A. M., de Arruda Pereira, M., and Car-

rano, E. G. (2010). Using genetic programming to

detect fraud in electronic transactions. In Prazeres,

C. V. S., Sampaio, P. N. M., Santanch

e, A., Santos, C.

A. S., and Goularte, R., editors, A Comprehensive Sur-

vey of Data Mining-based Fraud Detection Research,

volume abs/1009.6119, pages 337–340.

Attenberg, J. and Provost, F. J. (2010). Inactive learn-

ing?: difﬁculties employing active learning in prac-

tice. SIGKDD Explorations, 12(2):36–41.

Bhattacharyya, S., Jha, S., Tharakunnel, K. K., and West-

land, J. C. (2011). Data mining for credit card fraud:

A comparative study. Decision Support Systems,

50(3):602–613.

Breiman, L. (2001). Random forests. Machine Learning,

45(1):5–32.

Brown, I. and Mues, C. (2012). An experimental compari-

son of classiﬁcation algorithms for imbalanced credit

scoring data sets. Expert Syst. Appl., 39(3):3446–

3453.

Chatterjee, A. and Segev, A. (1991). Data manipulation

in heterogeneous databases. ACM SIGMOD Record,

20(4):64–68.

Che, D., Safran, M. S., and Peng, Z. (2013). From big

data to big data mining: Challenges, issues, and op-

portunities. In Hong, B., Meng, X., Chen, L., Wini-

warter, W., and Song, W., editors, Database Sys-

tems for Advanced Applications - 18th International

IoTBDS 2018 - 3rd International Conference on Internet of Things, Big Data and Security

Conference, DASFAA 2013, International Workshops:

BDMA, SNSM, SeCoP, Wuhan, China, April 22-25,

2013. Proceedings, volume 7827 of Lecture Notes in

Computer Science, pages 1–15. Springer.

Crone, S. F. and Finlay, S. (2012). Instance sampling in

credit scoring: An empirical study of sample size

and balancing. International Journal of Forecasting,

28(1):224–238.

Donmez, P., Carbonell, J. G., and Bennett, P. N. (2007).

Dual strategy active learning. In ECML, volume 4701

of Lecture Notes in Computer Science, pages 116–

127. Springer.

Edge, M. E. and Sampaio, P. R. F. (2009). A survey of

signature based methods for ﬁnancial fraud detection.

Computers & Security, 28(6):381–394.

Faraggi, D. and Reiser, B. (2002). Estimation of the area un-

der the roc curve. Statistics in medicine, 21(20):3093–

3106.

Gao, J., Fan, W., Han, J., and Yu, P. S. (2007). A general

framework for mining concept-drifting data streams

with skewed distributions. In Proceedings of the Sev-

enth SIAM International Conference on Data Min-

ing, April 26-28, 2007, Minneapolis, Minnesota, USA,

pages 3–14. SIAM.

Goldstein, M. and Uchida, S. (2016). A comparative eval-

uation of unsupervised anomaly detection algorithms

for multivariate data. PloS one, 11(4):e0152173.

Gopinathan, K. M., Biafore, L. S., Ferguson, W. M.,

Lazarus, M. A., Pathria, A. K., and Jost, A. (1998).

Fraud detection using predictive modeling. US Patent

5,819,226.

He, H. and Garcia, E. A. (2009). Learning from imbalanced

data. IEEE Trans. Knowl. Data Eng., 21(9):1263–

1284.

Hoffman, A. J. and Tessendorf, R. E. (2005). Artiﬁcial in-

telligence based fraud agent to identify supply chain

irregularities. In Hamza, M. H., editor, IASTED In-

ternational Conference on Artiﬁcial Intelligence and

Applications, part of the 23rd Multi-Conference on

Applied Informatics, Innsbruck, Austria, February 14-

16, 2005, pages 743–750. IASTED/ACTA Press.

Hooi, B., Shah, N., Beutel, A., G

unnemann, S., Akoglu,

L., Kumar, M., Makhija, D., and Faloutsos, C. (2016).

BIRDNEST: bayesian inference for ratings-fraud de-

tection. In Venkatasubramanian, S. C. and Jr., W. M.,

editors, Proceedings of the 2016 SIAM International

Conference on Data Mining, Miami, Florida, USA,

May 5-7, 2016, pages 495–503. SIAM.

Japkowicz, N. and Stephen, S. (2002). The class imbal-

ance problem: A systematic study. Intell. Data Anal.,

6(5):429–449.

Lek, M., Anandarajah, B., Cerpa, N., and Jamieson, R.

(2001). Data mining prototype for detecting e-

commerce fraud. In Smithson, S., Gricar, J., Pod-

logar, M., and Avgerinou, S., editors, Proceedings of

the 9th European Conference on Information Systems,

Global Co-operation in the New Millennium, ECIS

2001, Bled, Slovenia, June 27-29, 2001, pages 160–

165.

Lenard, M. J. and Alam, P. (2005). Application of fuzzy

logic fraud detection. In Khosrow-Pour, M., editor,

Encyclopedia of Information Science and Technology

(5 Volumes), pages 135–139. Idea Group.

Marqu

es, A. I., Garc

ıa, V., and S

anchez, J. S. (2013). On the

suitability of resampling techniques for the class im-

balance problem in credit scoring. JORS, 64(7):1060–

1070.

Phua, C., Lee, V. C. S., Smith-Miles, K., and Gayler, R. W.

(2010). A comprehensive survey of data mining-based

fraud detection research. CoRR, abs/1009.6119.

Powers, D. M. (2011). Evaluation: from precision, recall

and f-measure to roc, informedness, markedness and

correlation.

Pozzolo, A. D., Caelen, O., Borgne, Y. L., Waterschoot, S.,

and Bontempi, G. (2014). Learned lessons in credit

card fraud detection from a practitioner perspective.

Expert Syst. Appl., 41(10):4915–4928.

Pozzolo, A. D., Caelen, O., Johnson, R. A., and Bontempi,

G. (2015). Calibrating probability with undersampling

for unbalanced classiﬁcation. In IEEE Symposium Se-

ries on Computational Intelligence, SSCI 2015, Cape

Town, South Africa, December 7-10, 2015, pages

159–166. IEEE.

Sahin, Y., Bulkan, S., and Duman, E. (2013). A cost-

sensitive decision tree approach for fraud detection.

Expert Syst. Appl., 40(15):5916–5923.

Saia, R. (2017). A discrete wavelet transform approach to

fraud detection. In Yan, Z., Molva, R., Mazurczyk,

W., and Kantola, R., editors, Network and System

Security - 11th International Conference, NSS 2017,

Helsinki, Finland, August 21-23, 2017, Proceedings,

volume 10394 of Lecture Notes in Computer Science,

pages 464–474. Springer.

Saia, R., Boratto, L., and Carta, S. (2015). Multiple be-

havioral models: A divide and conquer strategy to

fraud detection in ﬁnancial data streams. In Fred, A.

L. N., Dietz, J. L. G., Aveiro, D., Liu, K., and Fil-

ipe, J., editors, KDIR 2015 - Proceedings of the Inter-

national Conference on Knowledge Discovery and In-

formation Retrieval, part of the 7th International Joint

Conference on Knowledge Discovery, Knowledge En-

gineering and Knowledge Management (IC3K 2015),

Volume 1, Lisbon, Portugal, November 12-14, 2015,

pages 496–503. SciTePress.

Saia, R. and Carta, S. (2017a). Evaluating credit card trans-

actions in the frequency domain for a proactive fraud

detection approach. In Samarati, P., Obaidat, M. S.,

and Cabello, E., editors, Proceedings of the 14th Inter-

national Joint Conference on e-Business and Telecom-

munications (ICETE 2017) - Volume 4: SECRYPT,

Madrid, Spain, July 24-26, 2017., pages 335–342.

SciTePress.

Saia, R. and Carta, S. (2017b). A frequency-domain-based

pattern mining for credit card fraud detection. In Ra-

machandran, M., Mu

noz, V. M., Kantere, V., Wills,

G., Walters, R. J., and Chang, V., editors, Proceed-

ings of the 2nd International Conference on Inter-

net of Things, Big Data and Security, IoTBDS 2017,

Porto, Portugal, April 24-26, 2017, pages 386–391.

SciTePress.

Unbalanced Data Classiﬁcation in Fraud Detection by Introducing a Multidimensional Space Analysis

Sobehart, J. and Keenan, S. (2001). Measuring default ac-

curately. Risk Magazine.

Sorournejad, S., Zojaji, Z., Atani, R. E., and Monadjemi,

A. H. (2016). A survey of credit card fraud detection

techniques: Data and technique oriented perspective.

CoRR, abs/1611.06439.

Vinciotti, V. and Hand, D. J. (2003). Scorecard construc-

tion with unbalanced class sizes. Journal of Iranian

Statistical Society, 2(2):189–205.

Wang, H., Fan, W., Yu, P. S., and Han, J. (2003). Mining

concept-drifting data streams using ensemble classi-

ﬁers. In Getoor, L., Senator, T. E., Domingos, P. M.,

and Faloutsos, C., editors, Proceedings of the Ninth

ACM SIGKDD International Conference on Knowl-

edge Discovery and Data Mining, Washington, DC,

USA, August 24 - 27, 2003, pages 226–235. ACM.

Whiting, D. G., Hansen, J. V., McDonald, J. B., Albrecht,

C. C., and Albrecht, W. S. (2012). Machine learning

methods for detecting patterns of management fraud.

Computational Intelligence, 28(4):505–527.

Zhang, L., Yang, J., Chu, W., and Tseng, B. L. (2011).

A machine-learned proactive moderation system for

auction fraud detection. In Macdonald, C., Ounis, I.,

and Ruthven, I., editors, Proceedings of the 20th ACM

Conference on Information and Knowledge Manage-

ment, CIKM 2011, Glasgow, United Kingdom, Octo-

ber 24-28, 2011, pages 2501–2504. ACM.

Zhu, J., Wang, H., Yao, T., and Tsou, B. K. (2008). Active

learning with sampling by uncertainty and density for

word sense disambiguation and text classiﬁcation. In

Scott, D. and Uszkoreit, H., editors, COLING 2008,

22nd International Conference on Computational Lin-

guistics, Proceedings of the Conference, 18-22 August

2008, Manchester, UK, pages 1137–1144.

IoTBDS 2018 - 3rd International Conference on Internet of Things, Big Data and Security