ZISC Neural Network Base Indicator for Classification

Complexity Estimation

Ivan Budnyk, Abdennasser Сhebira and Kurosh Madani

Images, Signals and Intelligent Systems Laboratory (LISSI / EA 3956)

PARIS XII University, Senart-Fontainebleau Institute of Technology

Bat.A, Av. Pierre Point, F-77127 Lieusaint, France

Abstract. This paper presents a new approach for estimating task complexity

using IBM© Zero Instruction Set Computer (ZISC ©). The goal is to build a

neural tree structure following the paradigm “divide and rule”. The aim of this

work is to define a complexity indicator-function and to hallmark its’ main fea-

tures.

1 Introduction

In this paper, we present the key point of a modular neural tree structure used to solve

classification problems. This modular tree structure called Tree Divide To Simplify

(T-DTS) is based on the divide to conquer paradigm [1].

Complexity reduction is the

key point on which the presented modular approach acts. Complexity reduction is

performed not only at problem’s solution level but also at processing procedure’s

level. The main idea is to reduce the complexity by splitting a complex problem into a

set of simpler problems: this leads to “multi-modeling" where a set of simple models

is used to sculpt a complex behavior. Thus, one of the foremost functions to be per-

formed is the complexity estimation. The complexity estimation approach we present

in this paper is based on a neurocomputer [2]. Before describing the proposed ap-

proach, we present in the second section T-DTS paradigm and then the IBM© Zero

Instruction Set Computer (ZISC®) tool. In the third section we propose a new ap-

proach for complexity estimation. We validate our approach with an academic

benchmark problem and study the proposed indicator function’s properties. Final

section presents conclusion and further perspectives of the presented work.

2 Applied Systems

In a very large number of cases dealing with real world dilemmas and applications

(system identification, industrial processes, manufacturing regulation, optimization,

decision, pattern recognition, systems, plants safety, etc), information is available as

data stored in files (databases etc…) [3]. So, efficient data processing becomes a chief

condition to solve problems related to above-mentioned areas.

Budnyk I., Ð ˛ahebira A. and Madani K. (2007).

ZISC Neural Network Base Indicator for Classiﬁcation Complexity Estimation.

In Proceedings of the 3rd International Workshop on Artiﬁcial Neural Networks and Intelligent Information Processing, pages 38-47

DOI: 10.5220/0001635600380047

Copyright

c

SciTePress

An issue could be model complexity reduction by splitting a complex problem into a

set of simpler problems: multi-modeling where a set of simple models is used to

sculpt a complex behavior ([4] and [5]). For such purpose a tree-like splitting process,

based on complexity estimation, divides the problem’s representative database on a

set of sub-databases, constructing a specific model (dedicated processing module) for

each of obtained sub-databases. That leads to a modular tree-like processing architec-

ture including several models.

2.1 Neural Tree Modular Approach

In order to deal with real word problem, we have proposed a modular approach based

on divide and conquer paradigm ([1], [3]). In this approach, Tree Divide To Simplify

or T-DTS, we divide a problem in sub problems recursively and generate a neural tree

computing structure. T-DTS and associated algorithm construct a tree-like evolution-

ary neural architecture automatically where nodes are decision units, and leafs corre-

spond to neural based processing units ([5], [6], [7]).

Processing Results

Structure Construction

Learning Phas

e

Feature Space Splitting

NN based Models Generation

Preprocessing (Normalizing,

Removing Outliers, Principal

Component Analysis)

(PD) - Preprocessed Data Targets (T)

Data (D), Targets (T)

P – Prototypes Neural Network Trained Parameters

Operation Phase

Complexity

Estimation

Module

Fig. 1. General bloc diagram of T-DTS.

T-DTS includes two main operation modes. The first is the learning mode, when

T-DTS system decomposes the input database and provides processing sub-structures

and tools for decomposed sets of data. The second mode is the operation mode. Fig-

ure 1 gives the general bloc diagram of T-DTS operational steps. As shows this fig-

ure, T-DTS could be characterized by four main operations: “data pre-processing”,

“learning process”, “generalization process” and complexity estimation module. The

tree structure construction is guided mainly by the complexity estimation module.

This module introduces a feedback in the learning process and control the tree com-

puting structure. The reliability of tree model to sculpt the problem behavior is asso-

39

ciated mainly to the complexity estimation module. This paper focuses on this aspect

and proposes a new approach based on a neurocomputer. In the following sub-section

we describe ZISC® neurocomputer.

2.2 IBM(c) ZISC® Neurocomputer

ZISC® neurocomputer is a fully integrated circuit based on neural network designed

for recognition and classification application which generally required supercomput-

ing. IBM ZISC-036 ([2], [5], [8]) is a parallel neural processor based on the RCE

(Reduced Coulomb Energy algorithm automatically adjusts the number of hidden

units and converges in only few epochs. The intermediate neurons are added only

when it is necessary. The influence field is then adjusted to minimize conflicting

zones by a threshold) and KNN algorithms (The k-nearest neighbor algorithm -

method for classifying objects based on closest training examples in the feature space.

k-NN is a type of instance-based learning, or lazy-learning where the function is only

approximated locally and all computation is deferred until classification).

Each chip is able to perform up to 250 000 recognitions per second ZISC® is the

implementation of the RBF-like (Radial Basic Function) model [9]. RBF approach

could be seen as mapping an N-dimensional space by prototypes. Each prototype is

associated with a category and an influence field. ZISC® system implements two

kinds of distance metrics called L1 and LSUP respectively. The first one (L1) corre-

sponds to a polyhedral volume influence field and the second (LSUP) to a hyper-

cubical one.

neuron 1

neuron 2

neuron 36

redrive

controls

address

data

[8]

[6]

[16]

I/O bus [30]

daisy chain i

n

[1]

logic

[4]

[21]

[1]

daisy chain out

inter Zisc communication bus

decision

output

Fig. 2. IBM ZISC-036 chip’s bloc diagram.

ZISC® neuron is an element, which is able:

40

• to memorize a prototype composed of 64 components, the associated category,

an influence field and a context,

• to compute the distance, based on the selected norm (norm L1 or LSUP) be-

tween its memorized prototype and the input vector,

• to compare the computed distance with the influence fields,

• to interact with other neurons (in order to find the minimum distance, category,

etc.),

• to adjust its influence field (during learning phase).

Fig 2 shows the bloc-diagram of an IBM© ZISC® chip. The next section present

the complexity estimation approach based on such neurocomputer’s capabilities.

3 Complexity Estimation Approach

The aim of complexity estimation is to check and measure the difficulty of a classifi-

cation task, before proper processing. Classification complexity estimation is used to

understand the behavior of classifiers. The most famous classification methods are

based on Bayes error, the theoretical probability of classification error. However it is

well known that Bayes error is difficult to compute directly. Significant part of com-

plexity estimation methods is related to Bayes error estimation. There are two general

ways to estimate Bayes error:

• indirectly [10] by proposing a measure which is a lower or higher bound of it

but easier to compute than direct estimation,

• Bayes error estimation by non-parametric method and show the relation to

Bayes error [11]. Other methods use space partitioning [12].

We deal with classification problems. We suppose that a database compounded of

a collection of m objects associated to labels or categories is available. To estimate

such database complexity we use the ZISC® as a classification tool. The goal we

want to reach is not to build a classifier for this problem, but to estimate the prob-

lems’ difficulty. We first used the ZISC® neurocomputer to learn this classification

problem using the associated database. Then we estimate the task complexity by

analyzing the generated neural network structure. We expect that a more complex

problem will involve a more complex structure. The simplest neural network structure

feature is the number n of neurons created during the learning phase. The following

indicator is defined, where n is a parameter that reflects complexity:

m

n

Q =

,

0,1 ≥≥ nm

(1)

We suppose that there exists some function n = g(.) that reflects problem com-

plexity. The arguments of this function may be the signal-to-noise ratio, the dimen-

sion of the representation space, boundary non-linearity and/or database size.

In a first approach, we consider only g(.) function’s variations according to m

axis: g(m).

We suppose that our database is free of any incorrect or missing information.

On the basis on g(m), a complexity indicator is defined as follow:

41

m

mg

mQ

i

i

)(

)( =

0)(,1, ≥≥ mgm

i

(2)

We expect that for the same problem, as we enhance m, the problem seem to be

less complex: more information reduces problem ambiguity. On the other hand, for

problems of different and increasing complexity, Q

i

indicator should have a higher

value. In order to check the expected behavior of this indicator function, we have

defined an academic and specific benchmark presented in the following sub-section.

3.1 Academic Benchmark Description

Basically we construct 5 databases representing a mapping of a restricted 2D space to

2 categories, (Fig. 3). Each pattern was divided into two and more equal striped sub-

zones, each of them belonging to the categories 1 or 2 alternatively.

Fig. 3. Test patterns.

In learning mode, we create samples using randomly generated points with coor-

dinates (x,y). The number of samples m, in our case of uniform random distribution,

naturally has an influence on the quality of the striped zones (categories) demarca-

tion. According to the value of the first coordinate x, and according to the amount of

the striped sub-zones, the appropriate category c is assigned to the sample, and such

structure (x

j

,y

j

, c

j

) sends to neurocomputer on learning.

The second mode is a classification or in other words real testing of the generaliz-

ing ZISC® neurocomputer abilities. We again, randomly and uniformly, generate m

samples and their associated category. Getting classification statistics, we compute

the indicator-function Q

i

.

3.2 Results

The testing has been performed within:

• 2 IBM(c) ZISC® modes (LSUP/L1),

• 5 different databases with increasing complexity,

• 8 variants of m value (50, 100, 250, 500, 1000, 2500, 5000, 10000).

For each set of parameters, tests are repeated 10 times in order to get statistics and

as stated to check the deviations and to get average. Totally, 800 tests have been per-

formed.

42

Fig. 4. LSUP ZISC’s mode, Q

i

(m).

Fig. 5. L1 ZISC’s mode, Q

i

(m).

Fig. 4 and Fig. 5 show the charts of Q

i

where i is the database index or pattern in-

dex. We expect that Q

5

for 10 sub-zones reaches a higher value than Q

1

. Intuitively the

problem corresponding to classification of 10 stripped zones (Q

5

) is more complex

than for 2 (Q

1

).

43

The chart analysis suggests that exists a point(s) m

j

such as:

0)(

)(

2

2

=

∂

∂

ji

mQ

m

(3)

At such point m

j

we have the following propreties:

0)(

)(

,0)(

)(

);(:0,1

2

2

2

2

>

∂

∂

⇒>∀<

∂

∂

⇒

−

∈

∀

>∃≥∀

jijji

jjjj

mQ

m

mmmQ

m

mmmm

ε

ε

(4.1)

or

0)(

)(

,0)(

)(

);(:0,1

2

2

2

2

<

∂

∂

⇒>∀>

∂

∂

⇒

−

∈

∀

>∃≥∀

jijji

jjjj

mQ

m

mmmQ

m

mmmm

ε

ε

(4.2)

It means that there exists one or more points m

j

where the second derivative of Q

i

changes its sign. Then we are interesting in m

0

defined by:

kj

kkj

mmm

mmmmm

<<<<

=

=

.....

,),...,..,max(

1

10

(5)

Where k is the number of points m

j

.

After polynomial approximation for 2 different ZISC’s modes we compute the

coefficients of complexity Q

i

(m

0

). Table 1 represents the summary of the obtained

results described on the Figures 4 and 5

Table 1. Coefficients of complexity for DNA sequences recognition.

Main characteristic of the point m

0

is:

constmQmmm

i

→⇒

+

∞→>∀ )(:

0

(6)

In our case const = 0 , in general not obviously const = 0. The feature of the second

derivative sign changing is also a characteristic of success rate of the classification

(Fig. 6 and Fig. 7).That supports the idea of the strong influence second derivate fea-

ture has on the complexity estimation task. That fact turn a look on the problems not

LSUP L1

ZISC’s mode

m

0

Q

i

(m

0

)

m

0

Q

i

(m

0

)

Example 1 100 0.154 88 0.151

Example 2 170 0.182 168 0.177

Example 3 190 0.233 186 0.229

Example 4 235 0.240 229 0.239

Example 5 265 0.261 254 0.254

44

from the quantity side of complexity, but allows us to make a transitional step on the

quality level. It is clearly seen that in our pattern examples, complexity of the classifi-

cation is lying in the range from Example 1 (2 zones, the easiest one) till Example 5.

Analysis of the plots m

0,Q1

(Example 1) till m

0,Q5

(Example 5) for related classification

tasks implies the following property:

54321

,0,0,0,0,0 QQQQQ

mmmmm

<

<

<

<

(7)

In our particular case

)()()()()(

0504030201

mQmQmQmQmQ

<

<

<

<

(8)

In our experimental validations, the relation (6) (giving the limit of Q

i

(m) when m

becomes + ∞) can be interpreted as the case where m is large comparing to m

0

mean-

ing that the additional new data doesn’t change the dynamic of the classification tasks.

In other words this signifies that situation becomes more predictable regarding indica-

tors’ evolution (Fig 4 and Fig 5) and the classification rates (Fig 6 and Fig 7).

Fig. 6. Success rates of patterns’ classification. Example 1 – 5. LSUP ZISC’s mode.

45

Fig. 7. Success rates of patterns’ classification. Example 1 – 5. L1 ZISC’s mode.

On the other hand, one can consider a particular value of m (an interesting value is

m

0

for which the second derivative of Q

i

(m) changes the sign) making Q

i

(m

0

) acting as

a “complexity coefficient”. In our case, Q

i

(m

0

) acts as a “checkpoint” evaluating the

“stability of the classification process”. The increase of m

0

stands for the classification

task’s complexity increasing.

4 Conclusions

In this paper we describe a new method for complexity estimation and propose a

constructed Q(m) – indicator function. This approach is based on the ZISC neuro-

computer. The complexity indicator is extracted from some pertinent neural network

structure parameters and specifically in this paper from the number of neuron in the

structure. More complex structures are related to more complex problems. The pre-

sented concept have been implemented on IBM© ZISC-036 ® massively parallel

neurocomputer validated using a two-classes set of classification academic bench-

marks with increasing complexity. First investigation of the second derivative sign

behavior of the proposed complexity indicator allows to exhibit some interesting

properties.

Perspectives of this work will be a formal description of the defined complexity

indicator, the specification of other pertinent parameters and the study of their proper-

ties. We are also working on the validation of this theoretical approach to complexity

46

evaluation of a real-word problem in the medical area: DNA patterns classification

(recognizing given a sequence of DNA the boundaries between exons and introns).

References

1. Bouyoucef, E., Chebira A., Rybnik, M., Madani, K.: 2005. Multiple Neural Network

Model Generator with Complexity Estimation and self-Organization Abilities. Interna-

tional Scientific Journal of Computing, ISSN 1727-6209, vol.4, issue 3, pp.20-29.

2.

Laboratory IBM France: May 15, 1998, ZISC® 036 Neurons User’s Manual, Version 1.2,

Component Development.

3.

Madani, K., Rybnik, M., Chebira A.: 2003. Data Driven Multiple Neural Network Models

Generator Based on a Tree-like Scheduler, LNCS series, Edited by: Mira, J., Prieto A., -

Springer Verlag, ISBN 3-540-40210-1, pp. 382-389.

4.

Jordan, M. I., Xu, L.: 1995. Convergence Results for the EM Approach to Mixture of Ex-

perts Architectures, Neural Networks, Vol. 8, N° 9, pp 1409-1431, Pergamon, Elsevier.

5.

Madani, K., Chebira, A.: 2000. A Data Analysis Approach Based on a Neural Networks

Data Sets Decomposition and it’s Hardware Implementation, PKDD 2000, Lyon, France.

6.

Tremiolles, G. De: March 1998. Contribution to the theoretical study of neuro-mimetic

models and to their experimental validation: a panel of industrial applications, Ph.D. Re-

port, University of PARIS XII.

7.

Tremiolles, G. De, Tannhof, P., Plougonven B., Demarigny C., Madani, K.: 1997. Visual

Probe Mark Inspection, using Hardware Implementation of Artificial Neural Networks, in

VLSI Production, LNCS - Biological and Artificial Computation : From Neuroscience to

Technology, Ed.: Mira, J., Diaz R. M., Cabestany J., Springer Verlag Berlin Heidelberg,

pp. 1374-1383.

8.

Laboratory IBM France: February 1995. ISC/ISA ACCELERATOR card for PC, User

Manual, IBM France.

9.

Park, J., Sandberg, J.W.: 1991. Universal approximation using radial basis functions net-

work, Neural Computation, vol. 3. pp. 246-257.

10.

Lin, J.: 1991. Divergence measures based on the Shannon entropy, IEEE Transactions on

Information Theory, 37(1):145-151.

11.

Parzen, E.: 1962. On estimation of a probability density function and mode, Annals of

Math. Statistics, vol. 33, pp. 1065-1076.

12.

Kohn, A., Nakano, L.G., Mani, V.: 1996. A class discriminability measure based on feature

space partitioning, Pattern Recognition, 29(5):873-887.

47