given this measure is relevant for the target attribute
(e.g. the probability of default for a customer). In
practice it is unlikely that any data mining algorithm
would be successful in case of very complex calcu-
lations. It would be unacceptably suboptimal to
leave such calculations to “good fortune”, instead of
doing them explicitly.
Therefore, there is no case for skipping to gener-
ate complex attributes (calculated measures) for
which is known to be relevant.
3.3 Coding Schemes
The third case mentions the coding schemes in con-
text of observability versus unobservability of in-
formation in a database.
The example is imperative of this paper because,
in contrast to first two examples, coding schemes are
in general rarely recognized as structure that is ag-
gravating the information retrieval rather than ren-
dering it easier.
3.3.1 What are Coding Schemes
Vocabulary definition of the term “coding scheme”
says: Coding scheme is a set of rules that maps the
elements of one set, the coded set, onto the elements
of another set, the code element set (Institute for
Telecommunication Sciences, 2000). The term is
mostly used in the telecommunications science.
In databases, coding scheme would be an entity
which is used to group, or hierarchise, instances of
some other entity. The elements (table rows) of cod-
ing scheme typically consist of fields “code” and
“name”. Code is usually a set of numerals and name
is the text field describing the meaning of the code.
For instance, let’s take our previous example of a
customer attribute “customer code”, with values 1,
2, 3 and 4; where 1 and 2 were resident customers
and 3 and 4 are foreign. If we would define a table
which would have four rows, numbers one to four in
one column and descriptions, e.g. “Resident natural
persons”, “Resident legal persons”, “Foreign natural
persons” and “Foreign legal persons” in second col-
umn, we could say we’ve defined a coding scheme.
Often in coding schemes the hierarchy is re-
flected already in the way codes are composed, de-
fined. So, in our example it would be typical to have
codes 00, 01, 10, 11 – instead of 1, 2, 3 and 4. Then
it could be defined that position one defines resi-
dence (domestic – foreign) and position two defines
whether customer is legal or natural person.
3.3.2 Why Coding Scheme is Bad Modelling
In the definition of unobservability it was said it is
unobservable that the value 1 carries the information
about customer being resident natural person, be-
cause the one querying the database needs to know
“something else” – the meaning of codes.
Seams that if we create the coding scheme table
we’ve solved the problem – the information is stored
in a database. Here is how it is: if we define two
attributes, one named “residence”, with values “For-
eign” and “Domestic” and another with values “Le-
gal person” and “Natural person”, we’ve done a
good job and we hadn’t introduced the coding
schemes problem. This is not the kind of a coding
scheme we are trying to depict here as being prob-
lematic.
The example of four values is minimalistic ex-
ample on which the idea was constructed. In reality
coding schemes grow to hundreds or thousands of
rows. Then the name of a category doesn’t say “do-
mestic” or “foreign”. It can say something like:
“Foreign natural person customer that is under su-
pervision of KS department and also has relation
with ABC bank in France.” So, the problem with
coding schemes is that the problem of no informa-
tion in database got “solved” in the way that the in-
formation is modelled in a free text field. That is
slightly better than no information – there is for in-
stance smaller probability for the needed piece of
information to get lost – but is still quite inadequate,
for the reasons discussed in subsection about free
text fields.
3.3.3 The Grand Coding Scheme
And now we are arriving to the greatest coding
scheme that one can encounter in a bank. It is the
chart of accounts. Chart of accounts is a list of all
accounts tracked by an accounting system. Wikipe-
dia further explains chart of accounts as: “should be
designed to capture financial information to make
good financial decisions”. This definition gives a
good idea about what chart of accounts used to be.
Chart of accounts is a hierarchical structure of
codes that group material value (either balances or
transactions). At top level the grouping is usually
done into: assets, liabilities, equity, income and ex-
penses. Each of these categories further breaks down
into company’s products or services, types of coun-
terparties to which the product or service is related,
and numerous other attributes, like: information
about terms, currencies, adjustments, risk provisions,
etc. The list is virtually indefinite. For any classical
accountant this is a must, a cornerstone.
ICSOFT 2008 - International Conference on Software and Data Technologies
364