Discovering Internal Fraud Models in a Stream of Banking Transactions

Fabien Vilar

1

, Marc Le Goc

1

, Philippe Bouche

2

and Pierre-Yves Rolland

1

1

Laboratory for Sciences of Information and Systems (LSIS), UMR CNRS 7296, Marseille, France

2

TOM4, Pelissanne, France

Keywords:

Data Mining, Knowledge Engineering, Online and Real Time Fraud Detection, Fraud Modelisation.

Abstract:

Internal frauds in the banking industry represent a huge cost and this problem is particularly difﬁcult to solve

because, by construction, swindlers being very imaginative persons, the fraud schemata evolves continuously.

Fraud detection systems must then learn from the continuously new fraud schematas, making them difﬁcult to

design. This paper proposes a new theoretical and practical approach to detect internal frauds and to model

fraud schematas. This approach is based on a particular method of abstraction that reduces the complexity of

the problem from O(n

2

) to O(n) making its implementation in a an Java program that detects and models the

frauds in real time and online with a simple professional personal computer. The results of this program are

presented with its application on a real-world fraud provided by a world wide French bank.

1 INTRODUCTION

Since the last three decades, the theoretical researches

in the econometrics and the ﬁnancial analysis domain

have developed a stochastic approach of the analy-

sis and the modeling of ﬁnancial datas that are cur-

rently largely used in the bank and the ﬁnancial in-

dustries (Fliess et al., 2011). During this period, the

exponential development of the information system of

the bank and the ﬁnancial companies allowed the us-

age of Data Mining or Machine Learning algorithms

to deﬁne new services, notably to bring elements to

solve the problem of the internal fraud detection in

the bank industry. But this problem is particularly

difﬁcult to solve because, by construction, swindlers

being very imaginative persons, the models of fraud

evolve continuously so that the fraud detection sys-

tems must learn from the new fraud schematas. This

paper proposes a new theoretical and practical ap-

proach to detect internal frauds and to model fraud

schematas thanks to a reduction of the complexity of

the problem from O(n

2

) to O(n). The proposed solu-

tion allows to detect and model internal frauds online,

in real time, with a simple professional personal com-

puter, this latter being able of handling more than 4

billions of transactions a day. The next Section de-

scribes the works related with the proposed approach.

Section 3 introduces the mains concepts of the Timed

Observations Theory (TOT, (Le Goc, 2006)) that are

required for the online fraud detection and modeling

from bank transactions. This framework has been im-

plemented in the TOM4FFS program (Timed Obser-

vations Mining for Fraud Fighting System) and its

results on the detection and the modeling of a real

world, and particularly complex, example of internal

fraud schemata concerning a world wide French bank

are provided in Section 4. The Section 5 concludes

this paper.

2 RELATED WORKS

To solve the general fraud detection problem, many

statistic and machine learning techniques have been

developed such as neural networks (Fanning and Cog-

ger, 1998; Green and Choi, 1997), genetic algorithms

(Hoogs and al., 2007), bayesian network and deci-

sion tree (Kirkos and al., 2007; Phua et al., 2004),

K-nearest neighbour (Kotsiantis and al., 2006), logis-

tic regression (Altman et al., 1994) and even rule-

based fuzzy reasoning system (Deshmukh and Tal-

luru, 1998). They have been used with more or less

success in operation (cf. for example (Roddick and

Spiliopoulou, 2002) for a complete state of the art

about these techniques). Technically speaking, the

main difﬁculties with these approaches are concerned

with (i) the data set representativity and completeness

problem and (ii) the high power of computation that

most of the learning algorithms require to learn and

to detect the fraud schemata from the huge amount

346

Vilar, F., Goc, M., Bouche, P. and Rolland, P..

Discovering Internal Fraud Models in a Stream of Banking Transactions.

In Proceedings of the 7th International Joint Conference on Computational Intelligence (IJCCI 2015) - Volume 1: ECTA, pages 346-351

ISBN: 978-989-758-157-1

Copyright

c

2015 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

of transactions that a bank handles each day. As a

consequence, most of the solutions can’t be used on-

line in real-time at low costs. On the other hand, the

paper (Fliess et al., 2011) puts on the light a con-

ceptual problem that is inherent to all statistics-based

approaches: the effect of the sampling on the datas.

Usually, the quantity of raw data is so huge that one

uses stochastic models to aggregate the raw datas in a

minimal set of samples, with a reduced set of dimen-

sions that maximizes the compactness of the learning

space so that learning algorithms can work efﬁciently.

This usual method introduces, in a devious way (of-

ten without the analysts realize it), the problem of the

loss of causality in data. Having very small effects on

most of the cases, this problem can be (and is) gener-

ally neglected. Nevertheless, it is signiﬁcant and im-

portant when the timestamps of the raw data are perti-

nents for the calculations: in that case, biases are then

introduced into both structure and value of the param-

eters of the learning models. And it becomes crucial

when dealing with online data ﬂows that must be an-

alyzed in real time. Since most of the proposed Data

Mining algorithms have been designed for non timed

data, some extensions have been proposed to this

aim as for example, AprioriAll (Agrawal and Srikant,

1995) or Winepi/Winepi (Mannila et al., 1995; Man-

nila et al., 1997). But, according to (Mannila, 2002)

or (Han and Kamber, 2006), these algorithms present

two main drawbacks: the number of discovered mod-

els increases in a nonlinear way with (i) the a pri-

ori setting of the values of the algorithm parameters

and (ii) the threshold values of the decision criteria,

even when only a very small fraction of these models

are interesting. There are few papers dealing with the

problem of fraud detection in online banking (Phua

and al., 2010; Jans et al., 2009; Wei and al., 2012).

Most of them concern external fraud detection in ap-

plication domains far from banking industry. To solve

these problems, the TOT deﬁnes a new Knowledge

Discovery in Databases (KDD) approach, speciﬁcally

designed to learn directly from timed data, without

any parameter. The theoretical basis of this KDD ap-

proach, implemented in the TOM4FFS program, are

described in the next sections.

3 INTRODUCTION TO THE TOT

The TOT deﬁnes a dynamic process as an arbitrarily

constituted set X(t) = {x

1

(t), ..., x

n

(t)} of timed func-

tions x

i

(t) of continuous time t. The set X(t) of func-

tions implicitly deﬁnes a set X = {x

1

, x

2

, ..., x

n

} of n

variable names denoted x

i

for simplicity. A dynamic

process X(t) is said to be observed by a program Θ

when this latter aims at writting timed messages de-

scribing the modiﬁcations over time of the functions

x

i

(t) of X(t). The aim of the TOT is precisely to

model observed processes deﬁned as a couple (X(t),

Θ(X, ∆)). To this aim, the TOT deﬁnes a timed obser-

vation to provide a meaning to a timed message:

Deﬁnition 1. Timed Observation

Let X(t) = {x

i

(t)}

i=1...n

be a set of time functions de-

scribing the evolution of a process that is observed by

a program Θ; let Γ = {t

k

}

t

k

∈ℜ

be a set of arbitrary

time instants in which Θ observes the functions; let

θ(x

θ

,δ

θ

,t

θ

) be a predicate implicitly determined by

Θ; and let ∆ = {δ

j

} be a set of constant values.

A timed observation (δ

j

,t

k

) ∈ ∆ × Γ made on

the time function x

i

(t) is the assignation of values

x

i

, δ

j

and t

k

to the predicate θ(x

θ

,δ

θ

,t

θ

) such that

θ(x

i

,δ

j

,t

k

).

The TOT notion of observation class makes the

link between a variable x

i

and a constant δ

j

.

Deﬁnition 2. Observation Class

Let X(t) = {x

i

(t)}

i=1...n

be a set of time functions that

are observed by an abstract program θ(X,∆) where

∆ = {δ

j

}

j=1...m

is the set of all the constants the ab-

stract program can use and X = {x

i

}

i=1...n

is the set

of variable names corresponding to X(t).

∀i ∈ [1,n], ∀ j ∈ [1,m] and ∀k ∈ N, an observation

class O

k

= {..., (x

i

,δ

j

),...} is a subset of X ×∆.

A timed observation (δ

j

,t

k

) always can be consid-

ered as an occurrence O

j

(t

k

) of an observation class

O

j

= {(x

i

,δ

j

)}. In practice, temporal functions x

i

(t)

describing the evolution of the process are piecewise

functions (see section 4).

Deﬁnition 3. Concrete Unary Observer

A concrete unary observer Θ

0

({x

i

},Z) is a program

observing a piecewise function and implementing

equation 1 to produce timed observations of the form

O

i

(t

k

i

) = (x

i

(t

k

i

) − x(t

k

i

−1

),t

k

i

):

∀t

k

i

,t

k

i

−1

,x

i

(t

k

i

−1

) 6= x

i

(t

k

i

) ⇒ write(O

i

(t

k

i

)) (1)

This generates a sequence ω

i

= {..., O

i

(t

k

i

),...} of

instances of the observation class occurrences O

i

(t

k

i

).

A temporal binary relation is the representation

of a sequential relation between two observations

classes O

i

and O

j

:

Deﬁnition 4. Temporal Binary Relation

A temporal binary relation r

i j

(O

i

,O

j

,[τ

−

i j

,τ

+

i j

]), τ

−

i j

∈

ℜ, τ

+

i j

∈ ℜ, is an oriented relation between two ob-

servation classes O

i

and O

j

that is timed constrained

with the [τ

−

i j

,τ

+

i j

] interval.

Temporal constraint [τ

−

i j

,τ

+

i j

] is the time interval to

observe the timed observation O

j

(t

j

) = (δ

j

,t

j

) of the

Discovering Internal Fraud Models in a Stream of Banking Transactions

347

output observation class O

j

after the timed observa-

tion O

i

(t

i

) = (δ

i

,t

i

) of the input observation class O

i

,

t

j

−t

i

∈ [τ

−

i j

,τ

+

i j

]. In that case, the temporal binary re-

lation r

i j

(O

i

,O

j

,[τ

−

i j

,τ

+

i j

]) is said to be observed and

is denoted r

i j

(O

i

(t

i

),O

j

(t

j

),[τ

−

i j

,τ

+

i j

]).

A program Θ

i

(O

i

,O

j

) designed to recog-

nize a temporal binary relation of the form

r

i j

(O

i

,O

j

,[τ

−

i j

,τ

+

i j

]) is called an abstract binary

observers (ABO).

Deﬁnition 5. Abstract Binary Observer

Any program Θ

i

(O

i

,O

j

) implementing the equation 2

is an Abstract Binary Observer.

∀t

k

i

∈ Γ

i

,∀t

k

j

∈ Γ

j

,t

k

j

≥ t

k

i

∃O

i

(t

k

i

) ∈ ω

i

∧ ∃O

j

(t

k

j

) ∈ ω

j

∧ (t

k

j

−t

k

i

) ∈ [τ

−

i j

,τ

+

i j

]

⇒ write(O

i j

(t

k

j

)) (2)

O

i j

(t

k

j

) is a timed observation and an occurrence

of an abstract observation class O

i j

= {(x

i j

,δ

i j

)} link-

ing an abstract variable x

i j

with the abstract binary

constant δ

i j

≡ (δ

i

,δ

j

).

4 APPLICATION

The aim of this section is to present the application

of the TOM4FFS program to detect and to model

schemata of potential internal fraud. An internal

fraud is a particular sequence of non-compliant trans-

actions the aim of which is to move money from

clients accounts to some accounts of a tactless man-

ager of a bank. The role of TOM4FFS is to detect

and to model, online and in real time, a schemata

of potential internal fraud from a continuous ﬂow of

transactions. The detected transactions remain po-

tentially fraudulent until their non-compliance have

been conﬁrmed. The fraud schemata of this exam-

ple is based on a set of pairs (a

k

i

(t

k

i

),a

k

j

(t

k

j

)) of

transactions where the ﬁrst transaction a

k

i

(t

k

i

) sells an

amount a

k

i

of money from an account of a customer

before the second transaction a

k

j

(t

k

j

) credits the same

amount to one of the accounts of the administrator

of the bank. The problem is then to ﬁnd the min-

imal set of transaction pairs (a

k

i

(t

k

i

),a

k

j

(t

k

j

)) satis-

fying the following constraints: (i) from a customer

account which is not an account of the manager, (ii)

to an account of a manager, (iii) where a

k

i

= −a

k

j

and (iv) t

k

i

≤ t

k

j

. It is clear that given a database

of n transactions, the complexity of this problem is

O(n

2

): when n is evaluated in millions (i.e. 10

6

), the

number of pairs to evaluate must be evaluated in mil-

lions of millions (i.e. 10

12

), making the problem dif-

ﬁcult for humans as for computers. As an illustration,

the internal fraud schemata studied in this section has

been detected by a client and required an analysis of

6 months for the bank’s expert in internal fraud. It

is to precise that the studied example is considered

as particularly complicated by this expert. Accord-

ing to the TOT, a bank transaction a

k

i

(t

k

i

) is a timed

observation ((x

i

(t

k

i

) − x

i

(t

k

i

−1

),t

k

i

) associated with a

bank account x

i

(t) of a particular customer. In this

application, since it will be seen that the cents can be

neglected in a ﬁrst step, x

i

(t) is considered as a piece-

wise constant time function deﬁned over Z (cf. ﬁgure

2 for an illustration). The constant (x

i

(t

k

i

) − x

i

(t

k

i

−1

))

is the amount of money that has been moved from

the account x

i

(t) at time t

k

i

: it is a natural number

of cents, which is positive when the account x

i

(t)

is credited and negative either. For example, when

considering the transaction (1|8|24|1169|-189.64,

2009/09/29 15:45:25), the symbol ”|” structures

the sequence of characters 1|8|24|1169|-189.64

in different items: the customer number 1, the ac-

count number 8, the type of the transaction 24, the

index of the transaction 1169 and the amount of

the transaction -189.64. The sequence of charac-

ters 2009/09/29 15:45:25 being the timestamps,

the transaction is then represented with the timed ob-

servation (−189.64, 2009/09/29 15 : 45 : 25). The

other items, the customer number, the account num-

ber, the type of the transaction and the index of the

transaction are then considered as attributes of the

timed observations. These attributes are deﬁned with

an ontology describing the customers, the accounts

and the types of transactions with frames similarly to

the Manchester OWL syntax. As a consequence, a

timed observation (−189.64, 2009/09/2915 : 45 : 25)

is both an instance of a frame customer, a frame ac-

count or a frame type of transactions. In the suite of

this section, a fraud schemata is represented with a set

of binary relations between customer’s accounts. The

problem with this representation of the transaction is

that, the number of constant δ is a priori inﬁnite (i.e.

equal to ℵ

0

, the cardinal of Z). Recalling that an

observation class O

i

= {(x

i

,δ

i

)} is a singleton asso-

ciating a constant δ

i

with one and only one variable

name x

i

(cf. deﬁnition 2), the number of observation

classes is also inﬁnite. As a consequence, the set of

timed binary relations that are required to recognize

all the possible pairs of transactions (a

k

i

(t

k

i

),a

k

j

(t

k

j

))

from (i) a customer account to (ii) an account of a

manager where (iii) a

k

i

= −a

k

j

and (iv) t

k

i

≤ t

k

j

is

also inﬁnite. When forgetting the cents, the constants

δ

i

≡ ((x

i

(t

k

i

) − x

i

(t

k

i

−1

)) of each of these timed ob-

servations can then be any natural number of Z. So,

an inﬁnite set of timed binary relations is required

to constitute the pairs of transactions satisfying the

ECTA 2015 - 7th International Conference on Evolutionary Computation Theory and Applications

348

required logical constraints. To solve this problem,

the idea is to deﬁne a compact representation of the

amounts, inspired from the Benford’s Law (Benford,

1938), also called the First-Digit Law. An amount

m = (x

i

(t

k

i

) − x

i

(t

k

i

−1

)) of a transaction can be repre-

sented with a signed sum of powers of 10: ∀z ∈ Z,z =

s(z) ·

∑

n

k=0

a

k

· 10

k

. In this representation, (i) s(z) is

the sign function of z i.e. s : Z → {−1,1},z 7→ −1 if

z < 0, 1 otherwise, (ii) n is the highest power of 10

of z (n ≥ 0), (iii) a

k

∈ D = {1, 2,...,8, 9} is the digit

deﬁning the value of the coefﬁcient of the k

th

power of

z. With the First-Digit Law in mind, the advantage of

this representation, when using only the digits of D, it

is possible to the following classiﬁcation function to

represent the set of transaction’s amount with a much

more smaller set O = {O

i

} of observation classes O

i

:

Deﬁnition 6. Classiﬁcation function

The classiﬁcation function µ maps any z ∈ Z to a

particular µ(z) of the set M = {..., -21, ..., -11, -9, -8,

..., -2, -1, 0, 1, 2, ..., 8, 9, 11, ..., 21, ... }, µ : Z →

M,z = s(z) ·

∑

n

k=0

a

k

· 10

k

7→ µ(z) = s(z).(10.n + a

n

).

The classiﬁcation function µ(z) only uses the ﬁrst

digit of z and n, its highest power of 10. In prac-

tice, the TOM4FFS program is set with a maxi-

mum value n

max

of power of 10 to create a ﬁ-

nite set O of (20 · n

max

+ 1) observation classes

to analyze transactions the amounts of which are

contained in the range [−(10.n

max

+ 1); 10.n

max

+

1]. For example, with n

max

= 9, the highest digit

of D, TOM4FFS creates 181 observation classes

O = {O

−91

,O

−90

,...,O

−1

,O

0

,O

+1

,...,O

+90

,O

+91

}

to take into account amounts contained in the range

] − 10

9

;10

9

[ (i.e. a billion). This interval is largely

sufﬁcient to the aim of the internal fraud detection.

With these 181 observation classes, TOM4FFS cre-

ates (i) a unique concrete unary observer Θ

i

({φ

i

},Z)

implementing the classiﬁcation function µ of deﬁni-

tion 6 where φ

i

is the name of an abstract time func-

tion φ

i

(t) that transforms the transaction’s amounts

s(z) ·

∑

n

k=0

a

k

· 10

k

in occurrences O

k

(t

k

), and (ii)

a network of 181 ABO’s, each of them being

speciﬁed with a timed binary relation of the form

r

i j

(O

i

,O

j

,[τ

−

i j

,τ

+

i j

]) where O

i

and O

j

are opposite

value of the µ classiﬁcation function. So, since the µ

classiﬁcation function is only concerned with 10.n +

a

n

, the default structure of the ABO (cf. equation 2) is

independent of the customer, the account and the type

of transaction: it is only concerned with the checking

of the classes and the time constraint [τ

−

i j

,τ

+

i j

]. The

ABO’s are then set with (i) a predicate equal to eval-

uate the constraint of equality of the true values of

the amount transactions (i.e. the cents are taken into

account) and (ii) two simple propositions to check if

the transactions are concerned with a costumer (and

not the manager) for constraint 1 and the manager

uniquely for constraint 2. The studied database Ω

is composed of 1492 transactions (cf. ﬁgure 1) be-

tween the bank administrator (ID CLI = 1003) and his

3 clients (ID CLI ∈ {1001,1002,1004}). This con-

cerns 30 banking accounts (column ID CPTE) and

40 transactions types (column ID TYP EVT) num-

bered from 3001 to 3040. Client 1001 owns 8 ac-

counts: 2006, 2008, 2009, 2012, 2013, 2014, 2015

and 2022. Client 1002 owns 5 accounts: 2005,

2011, 2019, 2023 and 2027. Bank administrator

owns 10 accounts: 2001, 2002, 2003, 2004, 2024,

2026, 2028, 2029, 2030 and 2031. Client 1004 owns

7 accounts 2007, 2010, 2016, 2017, 2018, 2020,

2021. Those transactions cover a period of more

than one year, from 2009/01/02 to 2010/03/12 and

involve amount of money from -445 200C to 460

614.36C. The aim is to ﬁnd among these transac-

tions, those who are potentially fraudulent. The set

of constant ∆ is the set M of the deﬁnition 6 so

that O is the set of observation classes. Let Γ =

{t

k

,t

k

∈ Ω} be the set of time instants contained in

the database Ω. Fraud detection and modelisation

are presented here by observing three different pro-

cesses: accounts, clients and transactions types pro-

cesses. Each process X(t) = {x

i

(t),i = 1...n

X

} is

made with n

X

piecewise timed functions. So each

unary observer Θ

0

i

({x

i

},Z) implementing equation

1 generates a sequence of timed observations ω

i

=

{O

i

(t

k

i

) = (δ

i

,t

k

i

),δ

i

∈ ∆,t

k

i

∈ Γ}. This creates a net-

work of 181 ABOs of the form r

i j

(O

i

,O

j

,[τ

−

i j

,τ

+

i j

]).

The time constraints of each ABO are arbitrary set

to τ

−

i j

= 0 day and τ

+

i j

= 30 days so that only

pairs of transactions (O

i

(t

k

i

),O

j

(t

k

j

)) where t

k

j

−t

k

i

∈

[0,30] are considered by the ABO. To observe ac-

counts process, let’s consider here the process X (t) =

{x

2001

(t),..., x

2031

(t)} made with 30 piecewise timed

functions x

i

(t) deﬁned on Z corresponding to the 30

banking accounts ID CPT E ∈ {2001,...,2031} con-

tained in the database Ω. Let’s take the case of the

account number 2007 whose piecewise timed func-

tion x

2007

(t) is represented on ﬁgure 2. The unary

observer Θ

0

2007

({x

2007

},Z) observes this function. At

Figure 1: Extract of the Banking Transactions Data Base.

Discovering Internal Fraud Models in a Stream of Banking Transactions

349

time t

0

= 2009/03/16 00 : 00 : 00, the value of

x

2007

(t) changes and is −17000 which is mapped

by the classiﬁcation function (see deﬁnition 6) to

−41. So unary observer creates a timed observation

O

−41

(2009/03/16 00 : 00 : 00) = (−41,2009/03/16

00 : 00 : 00) which is a representation of the

transaction a

1

= (1004|2007|3004|192|-17000.00,

2009/03/16 00:00:00) (see ﬁrst line in bold of ﬁg-

ure 1). Finally it adds it to the sequence of timed

observations ω

2007

= {..., O

−41

(2009/03/16 00 : 00 :

00),...}. In parallel, unary observer Θ

0

2001

({x

2001

},Z)

Figure 2: Piecewise functions for accounts 2001 and 2007.

does the same job, observing piecewise function

x

2001

(t) corresponding to the account 2001 of the

manager. It thus generates a sequence of timed ob-

servations ω

2001

. Each timed observation of ω

2001

is

a reprensentation of a transaction in database Ω. In

particular, O

+41

(2009/03/17 20 : 00 : 00) represents

transaction a

2

= (1003|2001|3039|1675|17000.00,

2009/03/1720:00:00) (see second bold line of

ﬁgure 1). As a consequence, the ABO imple-

menting the relation r

−4141

(O

−41

,O

+41

,[0,30]) will

be activated when receiving the timed observation

O

+41

(2009/03/17 20 : 00 : 00) after the observa-

tion O

−41

(2009/03/16 00 : 00 : 00), the time con-

straint [0,30] being satisﬁed. As a consequence,

the ABO will write the binary timed observation

O

−4141

(2009/03/17 20 : 00 : 00) denoting that the

corresponding transactions a

1

and a

2

are potentially

fraudulent. Doing so with the 1492 successive trans-

actions of the studied database, the network of 181

ABO’s of the TOM4FFS program produces the 8 bi-

nary timed observations of ﬁgure 3. The schemata

of these potential fraudulent transactions is repre-

Figure 3: Binary Timed Observations.

sented in ﬁgure 4. In this ﬁgure, the triangles rep-

resent the observation classes of the manager, the two

other forms representing those of two other costumers

(1001 and 1004). The labels represents total of the

moved money under the form of an interval [m

i

,m

j

]:

for example, the label [-17 000C, 17 000C] of the

relation between the accounts 2007 and 2001 means

that -17 000C have been moved from the account

2007 and 17 000C have been moved to the account

2001. The fraud schemata of ﬁgure 4 has been vali-

Figure 4: Fraud scheme for accounts.

dated by the internal fraud expert of the French bank:

the manager stole a total of 104 000 C from two cos-

tumers, 1004 (49 000 C) and 1001 (55 000C). Now

let’s observe clients process and consider here the pro-

cess X (t) = {x

1001

(t),..., x

1004

(t)} made with 4 piece-

wise timed functions x

i

(t) deﬁned on Z correspond-

ing to the 3 client ids (ID CLI ∈ {1001, 1002,1004})

and the bank administrator id (ID CLI = 1003) con-

tained in the database Ω. The same methodology

as previously presented is applied and leads to the

fraud scheme shown in ﬁgure 5. This conﬁrms the

Figure 5: Fraud scheme for clients.

fact that the manager has stolen a total of 104 000C

from two clients: 49 000 C from client 1004 and 55

000C from client 1001. Finally, to observe transac-

tions types process, let’s consider here the process

X(t) = {x

3001

(t),..., x

3040

(t)} made with 40 piecewise

timed functions x

i

(t) deﬁned on Z corresponding to

the 40 transactions types ids contained in the database

Ω. Figure 6 shows which types of transactions are in-

volved in the fraud scheme. The manager uses ﬁve

types of transactions among the 40 to steel his clients.

ECTA 2015 - 7th International Conference on Evolutionary Computation Theory and Applications

350

Figure 6: Fraud scheme for transactions types.

5 CONCLUSION

Since the last three decades, Data Mining or Machine

Learning algorithms are used to pursue the delicate

problem of the fraud detection in the bank industry.

These algorithms pose three main problems: (i) the

strong reluctance with which the bankers agree to sup-

ply in a third party a set of real-world transactions for

conﬁdentiality reasons, (ii) the problem of the data

set representativity, and to a lesser extent, its com-

pleteness, and (iii) the huge amount of transactions

that must be analyzed to detect the potential frauds.

This paper presents an operational program, called

TOM4FFS, to solve these three problems. The main

advantages of this method are (i) to be purely syn-

tactic what guarantees a strict conﬁdentiality and (ii)

to reduce the complexity of the problem of the fraud

detection from O(n

2

) to O(n). The TOM4FFS pro-

gram is then able to handle more than 4 billions of

transactions a day, online and in real time, with a

standard personal computer. This paper describes the

TOM4FFS program and its application to a real-world

fraud example of a world wide French bank. Our cur-

rent works are concerned with the extension of the

approach to more complex fraud schemata and its ap-

plication to the general problem of the conformity in

the banking and other industries.

REFERENCES

Agrawal, R. and Srikant, R. (1995). Mining sequential pat-

terns. Proceedings of the 11th International Confer-

ence on Data Engineering (ICDE95), pages 3–14.

Altman, E., Marco, G., and Varetto, F. (1994). Corporate

distress diagnosis: Comparisons using linear discrim-

inant analysis and neural networks (the italian experi-

ence). Journal of banking & ﬁnance, 18(3):505–529.

Benford, F. (1938). The law of anomalous numbers. Pro-

ceedings of the American Philosophical Society.

Deshmukh, A. and Talluru, L. (1998). A rule-based fuzzy

reasoning system for assessing the risk of manage-

ment fraud. International Journal of Intelligent Sys-

tems in Accounting, Finance & Management, 74:223–

241.

Fanning, K. and Cogger, K. (1998). Neural network detec-

tion of management fraud using published ﬁnancial

data. International Journal of Intelligent Systems in

Accounting, Finance & Management, 7:21–41.

Fliess, M., Join, C., and Hatt, F. (2011). Is a probabilis-

tic modeling really useful in ﬁnancial engineering?

In Conf

´

erence M

´

editerran

´

eenne sur L’Ing

´

enierie S

ˆ

ure

des Syst

`

emes Complexes.

Green, B. and Choi, J. (1997). Assessing the risk of man-

agement fraud through neural network technology.

Auditing, 161:14–28.

Han, J. and Kamber, M. (2006). Data Mining. Concepts

and Techniques. Morgan Kaufmann.

Hoogs, B. and al. (2007). A genetic algorithm approach

to detecting temporal patterns indicative of ﬁnancial

statement fraud. Intelligent Systems in Accounting, Fi-

nance and Management, 15:41–56.

Jans, M., Lybaert, N., and Vanhoof, K. (2009). A frame-

work for internal fraud risk reduction at it integrating

business processes: the ifr framework. The Interna-

tional Journal of Digital Accounting Research, 9:1–

29.

Kirkos, E. and al. (2007). Data mining techniques for the

detection of fraudulent ﬁnancial statements. Expert

Systems with Applications.

Kotsiantis, S. and al. (2006). Forecasting fraudulent ﬁnan-

cial statements using data mining. International Jour-

nal of Computation Intelligence, 3:104–100.

Le Goc, M. (2006). Notion d’observation pour le di-

agnostic des processus dynamiques: Application

`

a

Sachem et

`

a la d

´

ecouverte de connaissances tem-

porelles. Hdr, Aix-Marseille University, Facult

´

e des

Sciences et Techniques de Saint J

´

er

ˆ

ome.

Mannila, H. (2002). Local and global methods in data min-

ing: Basic techniques and open problems. 29th In-

ternational Colloquium on Automata, Languages and

Programming.

Mannila, H., Toivonen, H., and Verkamo, A. I. (1995). Dis-

covering frequent episodes in sequences. In Fayyad,

U. M. and Uthurusamy, R., editors, Proceedings of the

First International Conference on Knowledge Discov-

ery and Data Mining (KDD-95), Montreal, Canada.

AAAI Press.

Mannila, H., Toivonen, H., and Verkamo, A. I. (1997). Dis-

covery of frequent episodes in event sequences. Data

Mining and Knowledge Discovery, 1(3):259–289.

Phua, C. and al. (2010). A comprehensive survey of data

mining-based fraud detection research. arXiv preprint

arXiv:1009.6119.

Phua, C., Alahakoon, D., and Lee, V. (2004). Minority re-

port in fraud detection: classiﬁcation of skewed data.

ACM SIGKDD Explorations Newsletter, 6(1):50–59.

Roddick, F. J. and Spiliopoulou, M. (2002). A survey of

temporal knowledge discovery paradigms and meth-

ods. IEEE Transactions on Knowledge and Data En-

gineering, (14):750–767.

Wei, W. and al. (2012). Effective detection of sophisticated

online banking fraud on extremely imbalanced data.

World Wide Web: Internet and Web Information Sys-

tems, 16:449–475.

Discovering Internal Fraud Models in a Stream of Banking Transactions

351