
 
given this measure is relevant for the target attribute 
(e.g. the probability of default for a customer). In 
practice it is unlikely that any data mining algorithm 
would be successful in case of very complex calcu-
lations. It would be unacceptably suboptimal to 
leave such calculations to “good fortune”, instead of 
doing them explicitly.  
Therefore, there is no case for skipping to gener-
ate complex attributes (calculated measures) for 
which is known to be relevant. 
3.3 Coding Schemes 
The third case mentions the coding schemes in con-
text of observability versus unobservability of in-
formation in a database.  
The example is imperative of this paper because, 
in contrast to first two examples, coding schemes are 
in general rarely recognized as structure that is ag-
gravating the information retrieval rather than ren-
dering it easier. 
3.3.1  What are Coding Schemes 
Vocabulary definition of the term “coding scheme” 
says: Coding scheme is a set of rules that maps the 
elements of one set, the coded set, onto the elements 
of another set, the code element set (Institute for 
Telecommunication Sciences, 2000). The term is 
mostly used in the telecommunications science. 
In databases, coding scheme would be an entity 
which is used to group, or hierarchise, instances of 
some other entity. The elements (table rows) of cod-
ing scheme typically consist of fields “code” and 
“name”. Code is usually a set of numerals and name 
is the text field describing the meaning of the code.  
For instance, let’s take our previous example of a 
customer attribute “customer code”, with values 1, 
2, 3 and 4; where 1 and 2 were resident customers 
and 3 and 4 are foreign. If we would define a table 
which would have four rows, numbers one to four in 
one column and descriptions, e.g. “Resident natural 
persons”, “Resident legal persons”, “Foreign natural 
persons” and “Foreign legal persons” in second col-
umn, we could say we’ve defined a coding scheme.  
Often in coding schemes the hierarchy is re-
flected already in the way codes are composed, de-
fined. So, in our example it would be typical to have 
codes 00, 01, 10, 11 – instead of 1, 2, 3 and 4. Then 
it could be defined that position one defines resi-
dence (domestic – foreign) and position two defines 
whether customer is legal or natural person. 
 
3.3.2  Why Coding Scheme is Bad Modelling 
In the definition of unobservability it was said it is 
unobservable that the value 1 carries the information 
about customer being resident natural person, be-
cause the one querying the database needs to know 
“something else” – the meaning of codes. 
Seams that if we create the coding scheme table 
we’ve solved the problem – the information is stored 
in a database. Here is how it is: if we define two 
attributes, one named “residence”, with values “For-
eign” and “Domestic” and another with values “Le-
gal person” and “Natural person”, we’ve done a 
good job and we hadn’t introduced the coding 
schemes problem. This is not the kind of a coding 
scheme we are trying to depict here as being prob-
lematic. 
The example of four values is minimalistic ex-
ample on which the idea was constructed. In reality 
coding schemes grow to hundreds or thousands of 
rows. Then the name of a category doesn’t say “do-
mestic” or “foreign”. It can say something like: 
“Foreign natural person customer that is under su-
pervision of KS department and also has relation 
with ABC bank in France.” So, the problem with 
coding schemes is that the problem of no informa-
tion in database got “solved” in the way that the in-
formation is modelled in a free text field. That is 
slightly better than no information – there is for in-
stance smaller probability for the needed piece of 
information to get lost – but is still quite inadequate, 
for the reasons discussed in subsection about free 
text fields. 
3.3.3  The Grand Coding Scheme 
And now we are arriving to the greatest coding 
scheme that one can encounter in a bank. It is the 
chart of accounts. Chart of accounts is a list of all 
accounts tracked by an accounting system. Wikipe-
dia further explains chart of accounts as: “should be 
designed to capture financial information to make 
good financial decisions”. This definition gives a 
good idea about what chart of accounts used to be. 
Chart of accounts is a hierarchical structure of 
codes that group material value (either balances or 
transactions). At top level the grouping is usually 
done into: assets, liabilities, equity, income and ex-
penses. Each of these categories further breaks down 
into company’s products or services, types of coun-
terparties to which the product or service is related, 
and numerous other attributes, like: information 
about terms, currencies, adjustments, risk provisions, 
etc. The list is virtually indefinite. For any classical 
accountant this is a must, a cornerstone. 
ICSOFT 2008 - International Conference on Software and Data Technologies
364