Discovering Internal Fraud Models in a Stream of Banking Transactions
Fabien Vilar
1
, Marc Le Goc
1
, Philippe Bouche
2
and Pierre-Yves Rolland
1
1
Laboratory for Sciences of Information and Systems (LSIS), UMR CNRS 7296, Marseille, France
2
TOM4, Pelissanne, France
Keywords:
Data Mining, Knowledge Engineering, Online and Real Time Fraud Detection, Fraud Modelisation.
Abstract:
Internal frauds in the banking industry represent a huge cost and this problem is particularly difficult to solve
because, by construction, swindlers being very imaginative persons, the fraud schemata evolves continuously.
Fraud detection systems must then learn from the continuously new fraud schematas, making them difficult to
design. This paper proposes a new theoretical and practical approach to detect internal frauds and to model
fraud schematas. This approach is based on a particular method of abstraction that reduces the complexity of
the problem from O(n
2
) to O(n) making its implementation in a an Java program that detects and models the
frauds in real time and online with a simple professional personal computer. The results of this program are
presented with its application on a real-world fraud provided by a world wide French bank.
1 INTRODUCTION
Since the last three decades, the theoretical researches
in the econometrics and the financial analysis domain
have developed a stochastic approach of the analy-
sis and the modeling of financial datas that are cur-
rently largely used in the bank and the financial in-
dustries (Fliess et al., 2011). During this period, the
exponential development of the information system of
the bank and the financial companies allowed the us-
age of Data Mining or Machine Learning algorithms
to define new services, notably to bring elements to
solve the problem of the internal fraud detection in
the bank industry. But this problem is particularly
difficult to solve because, by construction, swindlers
being very imaginative persons, the models of fraud
evolve continuously so that the fraud detection sys-
tems must learn from the new fraud schematas. This
paper proposes a new theoretical and practical ap-
proach to detect internal frauds and to model fraud
schematas thanks to a reduction of the complexity of
the problem from O(n
2
) to O(n). The proposed solu-
tion allows to detect and model internal frauds online,
in real time, with a simple professional personal com-
puter, this latter being able of handling more than 4
billions of transactions a day. The next Section de-
scribes the works related with the proposed approach.
Section 3 introduces the mains concepts of the Timed
Observations Theory (TOT, (Le Goc, 2006)) that are
required for the online fraud detection and modeling
from bank transactions. This framework has been im-
plemented in the TOM4FFS program (Timed Obser-
vations Mining for Fraud Fighting System) and its
results on the detection and the modeling of a real
world, and particularly complex, example of internal
fraud schemata concerning a world wide French bank
are provided in Section 4. The Section 5 concludes
this paper.
2 RELATED WORKS
To solve the general fraud detection problem, many
statistic and machine learning techniques have been
developed such as neural networks (Fanning and Cog-
ger, 1998; Green and Choi, 1997), genetic algorithms
(Hoogs and al., 2007), bayesian network and deci-
sion tree (Kirkos and al., 2007; Phua et al., 2004),
K-nearest neighbour (Kotsiantis and al., 2006), logis-
tic regression (Altman et al., 1994) and even rule-
based fuzzy reasoning system (Deshmukh and Tal-
luru, 1998). They have been used with more or less
success in operation (cf. for example (Roddick and
Spiliopoulou, 2002) for a complete state of the art
about these techniques). Technically speaking, the
main difficulties with these approaches are concerned
with (i) the data set representativity and completeness
problem and (ii) the high power of computation that
most of the learning algorithms require to learn and
to detect the fraud schemata from the huge amount
346
Vilar, F., Goc, M., Bouche, P. and Rolland, P..
Discovering Internal Fraud Models in a Stream of Banking Transactions.
In Proceedings of the 7th International Joint Conference on Computational Intelligence (IJCCI 2015) - Volume 1: ECTA, pages 346-351
ISBN: 978-989-758-157-1
Copyright
c
2015 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
of transactions that a bank handles each day. As a
consequence, most of the solutions can’t be used on-
line in real-time at low costs. On the other hand, the
paper (Fliess et al., 2011) puts on the light a con-
ceptual problem that is inherent to all statistics-based
approaches: the effect of the sampling on the datas.
Usually, the quantity of raw data is so huge that one
uses stochastic models to aggregate the raw datas in a
minimal set of samples, with a reduced set of dimen-
sions that maximizes the compactness of the learning
space so that learning algorithms can work efficiently.
This usual method introduces, in a devious way (of-
ten without the analysts realize it), the problem of the
loss of causality in data. Having very small effects on
most of the cases, this problem can be (and is) gener-
ally neglected. Nevertheless, it is significant and im-
portant when the timestamps of the raw data are perti-
nents for the calculations: in that case, biases are then
introduced into both structure and value of the param-
eters of the learning models. And it becomes crucial
when dealing with online data flows that must be an-
alyzed in real time. Since most of the proposed Data
Mining algorithms have been designed for non timed
data, some extensions have been proposed to this
aim as for example, AprioriAll (Agrawal and Srikant,
1995) or Winepi/Winepi (Mannila et al., 1995; Man-
nila et al., 1997). But, according to (Mannila, 2002)
or (Han and Kamber, 2006), these algorithms present
two main drawbacks: the number of discovered mod-
els increases in a nonlinear way with (i) the a pri-
ori setting of the values of the algorithm parameters
and (ii) the threshold values of the decision criteria,
even when only a very small fraction of these models
are interesting. There are few papers dealing with the
problem of fraud detection in online banking (Phua
and al., 2010; Jans et al., 2009; Wei and al., 2012).
Most of them concern external fraud detection in ap-
plication domains far from banking industry. To solve
these problems, the TOT defines a new Knowledge
Discovery in Databases (KDD) approach, specifically
designed to learn directly from timed data, without
any parameter. The theoretical basis of this KDD ap-
proach, implemented in the TOM4FFS program, are
described in the next sections.
3 INTRODUCTION TO THE TOT
The TOT defines a dynamic process as an arbitrarily
constituted set X(t) = {x
1
(t), ..., x
n
(t)} of timed func-
tions x
i
(t) of continuous time t. The set X(t) of func-
tions implicitly defines a set X = {x
1
, x
2
, ..., x
n
} of n
variable names denoted x
i
for simplicity. A dynamic
process X(t) is said to be observed by a program Θ
when this latter aims at writting timed messages de-
scribing the modifications over time of the functions
x
i
(t) of X(t). The aim of the TOT is precisely to
model observed processes defined as a couple (X(t),
Θ(X, )). To this aim, the TOT defines a timed obser-
vation to provide a meaning to a timed message:
Definition 1. Timed Observation
Let X(t) = {x
i
(t)}
i=1...n
be a set of time functions de-
scribing the evolution of a process that is observed by
a program Θ; let Γ = {t
k
}
t
k
be a set of arbitrary
time instants in which Θ observes the functions; let
θ(x
θ
,δ
θ
,t
θ
) be a predicate implicitly determined by
Θ; and let = {δ
j
} be a set of constant values.
A timed observation (δ
j
,t
k
) × Γ made on
the time function x
i
(t) is the assignation of values
x
i
, δ
j
and t
k
to the predicate θ(x
θ
,δ
θ
,t
θ
) such that
θ(x
i
,δ
j
,t
k
).
The TOT notion of observation class makes the
link between a variable x
i
and a constant δ
j
.
Definition 2. Observation Class
Let X(t) = {x
i
(t)}
i=1...n
be a set of time functions that
are observed by an abstract program θ(X,) where
= {δ
j
}
j=1...m
is the set of all the constants the ab-
stract program can use and X = {x
i
}
i=1...n
is the set
of variable names corresponding to X(t).
i [1,n], j [1,m] and k N, an observation
class O
k
= {..., (x
i
,δ
j
),...} is a subset of X ×.
A timed observation (δ
j
,t
k
) always can be consid-
ered as an occurrence O
j
(t
k
) of an observation class
O
j
= {(x
i
,δ
j
)}. In practice, temporal functions x
i
(t)
describing the evolution of the process are piecewise
functions (see section 4).
Definition 3. Concrete Unary Observer
A concrete unary observer Θ
0
({x
i
},Z) is a program
observing a piecewise function and implementing
equation 1 to produce timed observations of the form
O
i
(t
k
i
) = (x
i
(t
k
i
) x(t
k
i
1
),t
k
i
):
t
k
i
,t
k
i
1
,x
i
(t
k
i
1
) 6= x
i
(t
k
i
) write(O
i
(t
k
i
)) (1)
This generates a sequence ω
i
= {..., O
i
(t
k
i
),...} of
instances of the observation class occurrences O
i
(t
k
i
).
A temporal binary relation is the representation
of a sequential relation between two observations
classes O
i
and O
j
:
Definition 4. Temporal Binary Relation
A temporal binary relation r
i j
(O
i
,O
j
,[τ
i j
,τ
+
i j
]), τ
i j
, τ
+
i j
, is an oriented relation between two ob-
servation classes O
i
and O
j
that is timed constrained
with the [τ
i j
,τ
+
i j
] interval.
Temporal constraint [τ
i j
,τ
+
i j
] is the time interval to
observe the timed observation O
j
(t
j
) = (δ
j
,t
j
) of the
Discovering Internal Fraud Models in a Stream of Banking Transactions
347
output observation class O
j
after the timed observa-
tion O
i
(t
i
) = (δ
i
,t
i
) of the input observation class O
i
,
t
j
t
i
[τ
i j
,τ
+
i j
]. In that case, the temporal binary re-
lation r
i j
(O
i
,O
j
,[τ
i j
,τ
+
i j
]) is said to be observed and
is denoted r
i j
(O
i
(t
i
),O
j
(t
j
),[τ
i j
,τ
+
i j
]).
A program Θ
i
(O
i
,O
j
) designed to recog-
nize a temporal binary relation of the form
r
i j
(O
i
,O
j
,[τ
i j
,τ
+
i j
]) is called an abstract binary
observers (ABO).
Definition 5. Abstract Binary Observer
Any program Θ
i
(O
i
,O
j
) implementing the equation 2
is an Abstract Binary Observer.
t
k
i
Γ
i
,t
k
j
Γ
j
,t
k
j
t
k
i
O
i
(t
k
i
) ω
i
O
j
(t
k
j
) ω
j
(t
k
j
t
k
i
) [τ
i j
,τ
+
i j
]
write(O
i j
(t
k
j
)) (2)
O
i j
(t
k
j
) is a timed observation and an occurrence
of an abstract observation class O
i j
= {(x
i j
,δ
i j
)} link-
ing an abstract variable x
i j
with the abstract binary
constant δ
i j
(δ
i
,δ
j
).
4 APPLICATION
The aim of this section is to present the application
of the TOM4FFS program to detect and to model
schemata of potential internal fraud. An internal
fraud is a particular sequence of non-compliant trans-
actions the aim of which is to move money from
clients accounts to some accounts of a tactless man-
ager of a bank. The role of TOM4FFS is to detect
and to model, online and in real time, a schemata
of potential internal fraud from a continuous flow of
transactions. The detected transactions remain po-
tentially fraudulent until their non-compliance have
been confirmed. The fraud schemata of this exam-
ple is based on a set of pairs (a
k
i
(t
k
i
),a
k
j
(t
k
j
)) of
transactions where the first transaction a
k
i
(t
k
i
) sells an
amount a
k
i
of money from an account of a customer
before the second transaction a
k
j
(t
k
j
) credits the same
amount to one of the accounts of the administrator
of the bank. The problem is then to find the min-
imal set of transaction pairs (a
k
i
(t
k
i
),a
k
j
(t
k
j
)) satis-
fying the following constraints: (i) from a customer
account which is not an account of the manager, (ii)
to an account of a manager, (iii) where a
k
i
= a
k
j
and (iv) t
k
i
t
k
j
. It is clear that given a database
of n transactions, the complexity of this problem is
O(n
2
): when n is evaluated in millions (i.e. 10
6
), the
number of pairs to evaluate must be evaluated in mil-
lions of millions (i.e. 10
12
), making the problem dif-
ficult for humans as for computers. As an illustration,
the internal fraud schemata studied in this section has
been detected by a client and required an analysis of
6 months for the bank’s expert in internal fraud. It
is to precise that the studied example is considered
as particularly complicated by this expert. Accord-
ing to the TOT, a bank transaction a
k
i
(t
k
i
) is a timed
observation ((x
i
(t
k
i
) x
i
(t
k
i
1
),t
k
i
) associated with a
bank account x
i
(t) of a particular customer. In this
application, since it will be seen that the cents can be
neglected in a first step, x
i
(t) is considered as a piece-
wise constant time function defined over Z (cf. figure
2 for an illustration). The constant (x
i
(t
k
i
) x
i
(t
k
i
1
))
is the amount of money that has been moved from
the account x
i
(t) at time t
k
i
: it is a natural number
of cents, which is positive when the account x
i
(t)
is credited and negative either. For example, when
considering the transaction (1|8|24|1169|-189.64,
2009/09/29 15:45:25), the symbol | structures
the sequence of characters 1|8|24|1169|-189.64
in different items: the customer number 1, the ac-
count number 8, the type of the transaction 24, the
index of the transaction 1169 and the amount of
the transaction -189.64. The sequence of charac-
ters 2009/09/29 15:45:25 being the timestamps,
the transaction is then represented with the timed ob-
servation (189.64, 2009/09/29 15 : 45 : 25). The
other items, the customer number, the account num-
ber, the type of the transaction and the index of the
transaction are then considered as attributes of the
timed observations. These attributes are defined with
an ontology describing the customers, the accounts
and the types of transactions with frames similarly to
the Manchester OWL syntax. As a consequence, a
timed observation (189.64, 2009/09/2915 : 45 : 25)
is both an instance of a frame customer, a frame ac-
count or a frame type of transactions. In the suite of
this section, a fraud schemata is represented with a set
of binary relations between customer’s accounts. The
problem with this representation of the transaction is
that, the number of constant δ is a priori infinite (i.e.
equal to
0
, the cardinal of Z). Recalling that an
observation class O
i
= {(x
i
,δ
i
)} is a singleton asso-
ciating a constant δ
i
with one and only one variable
name x
i
(cf. definition 2), the number of observation
classes is also infinite. As a consequence, the set of
timed binary relations that are required to recognize
all the possible pairs of transactions (a
k
i
(t
k
i
),a
k
j
(t
k
j
))
from (i) a customer account to (ii) an account of a
manager where (iii) a
k
i
= a
k
j
and (iv) t
k
i
t
k
j
is
also infinite. When forgetting the cents, the constants
δ
i
((x
i
(t
k
i
) x
i
(t
k
i
1
)) of each of these timed ob-
servations can then be any natural number of Z. So,
an infinite set of timed binary relations is required
to constitute the pairs of transactions satisfying the
ECTA 2015 - 7th International Conference on Evolutionary Computation Theory and Applications
348
required logical constraints. To solve this problem,
the idea is to define a compact representation of the
amounts, inspired from the Benford’s Law (Benford,
1938), also called the First-Digit Law. An amount
m = (x
i
(t
k
i
) x
i
(t
k
i
1
)) of a transaction can be repre-
sented with a signed sum of powers of 10: z Z,z =
s(z) ·
n
k=0
a
k
· 10
k
. In this representation, (i) s(z) is
the sign function of z i.e. s : Z {−1,1},z 7→ 1 if
z < 0, 1 otherwise, (ii) n is the highest power of 10
of z (n 0), (iii) a
k
D = {1, 2,...,8, 9} is the digit
defining the value of the coefficient of the k
th
power of
z. With the First-Digit Law in mind, the advantage of
this representation, when using only the digits of D, it
is possible to the following classification function to
represent the set of transaction’s amount with a much
more smaller set O = {O
i
} of observation classes O
i
:
Definition 6. Classification function
The classification function µ maps any z Z to a
particular µ(z) of the set M = {..., -21, ..., -11, -9, -8,
..., -2, -1, 0, 1, 2, ..., 8, 9, 11, ..., 21, ... }, µ : Z
M,z = s(z) ·
n
k=0
a
k
· 10
k
7→ µ(z) = s(z).(10.n + a
n
).
The classification function µ(z) only uses the first
digit of z and n, its highest power of 10. In prac-
tice, the TOM4FFS program is set with a maxi-
mum value n
max
of power of 10 to create a fi-
nite set O of (20 · n
max
+ 1) observation classes
to analyze transactions the amounts of which are
contained in the range [(10.n
max
+ 1); 10.n
max
+
1]. For example, with n
max
= 9, the highest digit
of D, TOM4FFS creates 181 observation classes
O = {O
91
,O
90
,...,O
1
,O
0
,O
+1
,...,O
+90
,O
+91
}
to take into account amounts contained in the range
] 10
9
;10
9
[ (i.e. a billion). This interval is largely
sufficient to the aim of the internal fraud detection.
With these 181 observation classes, TOM4FFS cre-
ates (i) a unique concrete unary observer Θ
i
({φ
i
},Z)
implementing the classification function µ of defini-
tion 6 where φ
i
is the name of an abstract time func-
tion φ
i
(t) that transforms the transaction’s amounts
s(z) ·
n
k=0
a
k
· 10
k
in occurrences O
k
(t
k
), and (ii)
a network of 181 ABO’s, each of them being
specified with a timed binary relation of the form
r
i j
(O
i
,O
j
,[τ
i j
,τ
+
i j
]) where O
i
and O
j
are opposite
value of the µ classification function. So, since the µ
classification function is only concerned with 10.n +
a
n
, the default structure of the ABO (cf. equation 2) is
independent of the customer, the account and the type
of transaction: it is only concerned with the checking
of the classes and the time constraint [τ
i j
,τ
+
i j
]. The
ABO’s are then set with (i) a predicate equal to eval-
uate the constraint of equality of the true values of
the amount transactions (i.e. the cents are taken into
account) and (ii) two simple propositions to check if
the transactions are concerned with a costumer (and
not the manager) for constraint 1 and the manager
uniquely for constraint 2. The studied database
is composed of 1492 transactions (cf. figure 1) be-
tween the bank administrator (ID CLI = 1003) and his
3 clients (ID CLI {1001,1002,1004}). This con-
cerns 30 banking accounts (column ID CPTE) and
40 transactions types (column ID TYP EVT) num-
bered from 3001 to 3040. Client 1001 owns 8 ac-
counts: 2006, 2008, 2009, 2012, 2013, 2014, 2015
and 2022. Client 1002 owns 5 accounts: 2005,
2011, 2019, 2023 and 2027. Bank administrator
owns 10 accounts: 2001, 2002, 2003, 2004, 2024,
2026, 2028, 2029, 2030 and 2031. Client 1004 owns
7 accounts 2007, 2010, 2016, 2017, 2018, 2020,
2021. Those transactions cover a period of more
than one year, from 2009/01/02 to 2010/03/12 and
involve amount of money from -445 200C to 460
614.36C. The aim is to find among these transac-
tions, those who are potentially fraudulent. The set
of constant is the set M of the definition 6 so
that O is the set of observation classes. Let Γ =
{t
k
,t
k
} be the set of time instants contained in
the database . Fraud detection and modelisation
are presented here by observing three different pro-
cesses: accounts, clients and transactions types pro-
cesses. Each process X(t) = {x
i
(t),i = 1...n
X
} is
made with n
X
piecewise timed functions. So each
unary observer Θ
0
i
({x
i
},Z) implementing equation
1 generates a sequence of timed observations ω
i
=
{O
i
(t
k
i
) = (δ
i
,t
k
i
),δ
i
,t
k
i
Γ}. This creates a net-
work of 181 ABOs of the form r
i j
(O
i
,O
j
,[τ
i j
,τ
+
i j
]).
The time constraints of each ABO are arbitrary set
to τ
i j
= 0 day and τ
+
i j
= 30 days so that only
pairs of transactions (O
i
(t
k
i
),O
j
(t
k
j
)) where t
k
j
t
k
i
[0,30] are considered by the ABO. To observe ac-
counts process, let’s consider here the process X (t) =
{x
2001
(t),..., x
2031
(t)} made with 30 piecewise timed
functions x
i
(t) defined on Z corresponding to the 30
banking accounts ID CPT E {2001,...,2031} con-
tained in the database . Let’s take the case of the
account number 2007 whose piecewise timed func-
tion x
2007
(t) is represented on figure 2. The unary
observer Θ
0
2007
({x
2007
},Z) observes this function. At
Figure 1: Extract of the Banking Transactions Data Base.
Discovering Internal Fraud Models in a Stream of Banking Transactions
349
time t
0
= 2009/03/16 00 : 00 : 00, the value of
x
2007
(t) changes and is 17000 which is mapped
by the classification function (see definition 6) to
41. So unary observer creates a timed observation
O
41
(2009/03/16 00 : 00 : 00) = (41,2009/03/16
00 : 00 : 00) which is a representation of the
transaction a
1
= (1004|2007|3004|192|-17000.00,
2009/03/16 00:00:00) (see first line in bold of fig-
ure 1). Finally it adds it to the sequence of timed
observations ω
2007
= {..., O
41
(2009/03/16 00 : 00 :
00),...}. In parallel, unary observer Θ
0
2001
({x
2001
},Z)
Figure 2: Piecewise functions for accounts 2001 and 2007.
does the same job, observing piecewise function
x
2001
(t) corresponding to the account 2001 of the
manager. It thus generates a sequence of timed ob-
servations ω
2001
. Each timed observation of ω
2001
is
a reprensentation of a transaction in database . In
particular, O
+41
(2009/03/17 20 : 00 : 00) represents
transaction a
2
= (1003|2001|3039|1675|17000.00,
2009/03/1720:00:00) (see second bold line of
figure 1). As a consequence, the ABO imple-
menting the relation r
4141
(O
41
,O
+41
,[0,30]) will
be activated when receiving the timed observation
O
+41
(2009/03/17 20 : 00 : 00) after the observa-
tion O
41
(2009/03/16 00 : 00 : 00), the time con-
straint [0,30] being satisfied. As a consequence,
the ABO will write the binary timed observation
O
4141
(2009/03/17 20 : 00 : 00) denoting that the
corresponding transactions a
1
and a
2
are potentially
fraudulent. Doing so with the 1492 successive trans-
actions of the studied database, the network of 181
ABO’s of the TOM4FFS program produces the 8 bi-
nary timed observations of figure 3. The schemata
of these potential fraudulent transactions is repre-
Figure 3: Binary Timed Observations.
sented in figure 4. In this figure, the triangles rep-
resent the observation classes of the manager, the two
other forms representing those of two other costumers
(1001 and 1004). The labels represents total of the
moved money under the form of an interval [m
i
,m
j
]:
for example, the label [-17 000C, 17 000C] of the
relation between the accounts 2007 and 2001 means
that -17 000C have been moved from the account
2007 and 17 000C have been moved to the account
2001. The fraud schemata of figure 4 has been vali-
Figure 4: Fraud scheme for accounts.
dated by the internal fraud expert of the French bank:
the manager stole a total of 104 000 C from two cos-
tumers, 1004 (49 000 C) and 1001 (55 000C). Now
let’s observe clients process and consider here the pro-
cess X (t) = {x
1001
(t),..., x
1004
(t)} made with 4 piece-
wise timed functions x
i
(t) defined on Z correspond-
ing to the 3 client ids (ID CLI {1001, 1002,1004})
and the bank administrator id (ID CLI = 1003) con-
tained in the database . The same methodology
as previously presented is applied and leads to the
fraud scheme shown in figure 5. This confirms the
Figure 5: Fraud scheme for clients.
fact that the manager has stolen a total of 104 000C
from two clients: 49 000 C from client 1004 and 55
000C from client 1001. Finally, to observe transac-
tions types process, let’s consider here the process
X(t) = {x
3001
(t),..., x
3040
(t)} made with 40 piecewise
timed functions x
i
(t) defined on Z corresponding to
the 40 transactions types ids contained in the database
. Figure 6 shows which types of transactions are in-
volved in the fraud scheme. The manager uses five
types of transactions among the 40 to steel his clients.
ECTA 2015 - 7th International Conference on Evolutionary Computation Theory and Applications
350
Figure 6: Fraud scheme for transactions types.
5 CONCLUSION
Since the last three decades, Data Mining or Machine
Learning algorithms are used to pursue the delicate
problem of the fraud detection in the bank industry.
These algorithms pose three main problems: (i) the
strong reluctance with which the bankers agree to sup-
ply in a third party a set of real-world transactions for
confidentiality reasons, (ii) the problem of the data
set representativity, and to a lesser extent, its com-
pleteness, and (iii) the huge amount of transactions
that must be analyzed to detect the potential frauds.
This paper presents an operational program, called
TOM4FFS, to solve these three problems. The main
advantages of this method are (i) to be purely syn-
tactic what guarantees a strict confidentiality and (ii)
to reduce the complexity of the problem of the fraud
detection from O(n
2
) to O(n). The TOM4FFS pro-
gram is then able to handle more than 4 billions of
transactions a day, online and in real time, with a
standard personal computer. This paper describes the
TOM4FFS program and its application to a real-world
fraud example of a world wide French bank. Our cur-
rent works are concerned with the extension of the
approach to more complex fraud schemata and its ap-
plication to the general problem of the conformity in
the banking and other industries.
REFERENCES
Agrawal, R. and Srikant, R. (1995). Mining sequential pat-
terns. Proceedings of the 11th International Confer-
ence on Data Engineering (ICDE95), pages 3–14.
Altman, E., Marco, G., and Varetto, F. (1994). Corporate
distress diagnosis: Comparisons using linear discrim-
inant analysis and neural networks (the italian experi-
ence). Journal of banking & finance, 18(3):505–529.
Benford, F. (1938). The law of anomalous numbers. Pro-
ceedings of the American Philosophical Society.
Deshmukh, A. and Talluru, L. (1998). A rule-based fuzzy
reasoning system for assessing the risk of manage-
ment fraud. International Journal of Intelligent Sys-
tems in Accounting, Finance & Management, 74:223–
241.
Fanning, K. and Cogger, K. (1998). Neural network detec-
tion of management fraud using published financial
data. International Journal of Intelligent Systems in
Accounting, Finance & Management, 7:21–41.
Fliess, M., Join, C., and Hatt, F. (2011). Is a probabilis-
tic modeling really useful in financial engineering?
In Conf
´
erence M
´
editerran
´
eenne sur L’Ing
´
enierie S
ˆ
ure
des Syst
`
emes Complexes.
Green, B. and Choi, J. (1997). Assessing the risk of man-
agement fraud through neural network technology.
Auditing, 161:14–28.
Han, J. and Kamber, M. (2006). Data Mining. Concepts
and Techniques. Morgan Kaufmann.
Hoogs, B. and al. (2007). A genetic algorithm approach
to detecting temporal patterns indicative of financial
statement fraud. Intelligent Systems in Accounting, Fi-
nance and Management, 15:41–56.
Jans, M., Lybaert, N., and Vanhoof, K. (2009). A frame-
work for internal fraud risk reduction at it integrating
business processes: the ifr framework. The Interna-
tional Journal of Digital Accounting Research, 9:1–
29.
Kirkos, E. and al. (2007). Data mining techniques for the
detection of fraudulent financial statements. Expert
Systems with Applications.
Kotsiantis, S. and al. (2006). Forecasting fraudulent finan-
cial statements using data mining. International Jour-
nal of Computation Intelligence, 3:104–100.
Le Goc, M. (2006). Notion d’observation pour le di-
agnostic des processus dynamiques: Application
`
a
Sachem et
`
a la d
´
ecouverte de connaissances tem-
porelles. Hdr, Aix-Marseille University, Facult
´
e des
Sciences et Techniques de Saint J
´
er
ˆ
ome.
Mannila, H. (2002). Local and global methods in data min-
ing: Basic techniques and open problems. 29th In-
ternational Colloquium on Automata, Languages and
Programming.
Mannila, H., Toivonen, H., and Verkamo, A. I. (1995). Dis-
covering frequent episodes in sequences. In Fayyad,
U. M. and Uthurusamy, R., editors, Proceedings of the
First International Conference on Knowledge Discov-
ery and Data Mining (KDD-95), Montreal, Canada.
AAAI Press.
Mannila, H., Toivonen, H., and Verkamo, A. I. (1997). Dis-
covery of frequent episodes in event sequences. Data
Mining and Knowledge Discovery, 1(3):259–289.
Phua, C. and al. (2010). A comprehensive survey of data
mining-based fraud detection research. arXiv preprint
arXiv:1009.6119.
Phua, C., Alahakoon, D., and Lee, V. (2004). Minority re-
port in fraud detection: classification of skewed data.
ACM SIGKDD Explorations Newsletter, 6(1):50–59.
Roddick, F. J. and Spiliopoulou, M. (2002). A survey of
temporal knowledge discovery paradigms and meth-
ods. IEEE Transactions on Knowledge and Data En-
gineering, (14):750–767.
Wei, W. and al. (2012). Effective detection of sophisticated
online banking fraud on extremely imbalanced data.
World Wide Web: Internet and Web Information Sys-
tems, 16:449–475.
Discovering Internal Fraud Models in a Stream of Banking Transactions
351