Table 3: Mushroom data set attribute Information.
N. Attribute Values
0 classes edible=e, poisonous=p
1 cap.shape bell=b,conical=c,convex=x,flat=f,knobbed=k,
sunken=s
2 cap.surface fibrous=f,grooves=g,scaly=y,smooth=s
3 cap.color brown=n,buff=b,cinnamon=c,gray=g,green=r,
pink=p,purple=u,red=e,white=w,yellow=y
4 bruises? bruises=t,no=f
5 odor almond=a,anise=l,creosote=c,fishy=y,foul=f,
musty=m,none=n,pungent=p,spicy=s
6 gill.attachment attached=a,descending=d,free=f,notched=n
7 gill.spacing close=c,crowded=w,distant=d
8 gill.size broad=b,narrow=n
9 gill.color black=k,brown=n,buff=b,chocolate=h,gray=g,
green=r,orange=o,pink=p,purple=u,red=e,white=w,
yellow=y
10 stalk.shape enlarging=e,tapering=t
11 stalk.root bulbous=b,club=c,cup=u,equal=e,rhizomorphs=z,
rooted=r,missing=?
12 stalk.surface.above.ring ibrous=f,scaly=y,silky=k,smooth=s
13 stalk.surface.below.ring ibrous=f,scaly=y,silky=k,smooth=s
14 stalk.color.above.ring brown=n,buff=b,cinnamon=c,gray=g,orange=o,
pink=p,red=e,white=w,yellow=y
15 stalk.color.below.ring brown=n,buff=b,cinnamon=c,gray=g,orange=o,
pink=p,red=e,white=w,yellow=y
16 veil.type partial=p,universal=u
17 veil.color brown=n,orange=o,white=w,yellow=y
18 ring.number none=n,one=o,two=t
19 ring.type cobwebby=c,evanescent=e,flaring=f,large=l,none=n,
pendant=p,sheathing=s,zone=z
20 spore.print.color black=k,brown=n,buff=b,chocolate=h,green=r,
orange=o,purple=u,white=w,yellow=y
21 population abundant=a,clustered=c,numerous=n,scattered=s,
several=v,solitary=y
22 habitat grasses=g,leaves=l,meadows=m,paths=p,urban=u,
waste=w,woods=d
Table 3: Mushroom data set attribute Information.
5.1 Mushrooms
Mushroom is a data set available in the UCI Machine
Learning Repository. This data set includes descrip-
tions of hypothetical samples corresponding to 23
species of gilled mushrooms in the Agaricus and Le-
piota Family. Each species is identified as definitely
edible, definitely poisonous, or of unknown edibility
and not recommended. This latter class was combined
with the poisonous one. The Guide clearly states that
there is no simple rule for determining the edibility of
a mushroom. However, we will try to find one using
the data set as a truth table.
The data set has 8124 instances defined using 22
nominally valued attributes presented in the table be-
low. It has missing attribute values, 2480, all for at-
tribute #11. 4208 instances (51.8%) are classified as
edible and 3916 (48.2%) are classified as poisonous.
An example of a known rule for edible mush-
rooms is:
odor=(almond.OR.anise.OR.none).AND.spore-print-color=NOT.green
gives 48 errors, or 99.41% accuracy on the whole
dataset
We used an unsupervised filter that converted all
nominal attributes into binary numeric attributes. An
attribute with k values was transformed into k binary
attributes. This produced a data set containing 111
binary attributes.
After the binarization we used the described
method to select relevant attributes for mushroom
classification by fixing a weak stoping criterion. As
a result, the method produced a model, with 100%
accuracy, depending on 23 binary attributes defined
by values of:
odor,gill.size,stalk.surface.above.ring, ring.type, spore.print.color.
We used the values assumed by these attributes to pro-
duce a new data set. After 3 tries we selected the
model less complex:
A1 : bruises? = t
1
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
A2 : odor ∈ {a, l, n}
1
W
W
W
W
W
W
W
W
W
W
W
W
W
W
W
W
1
A3 : stalk.sur f ace.above.ring = k
−1
[
[
[
[
[
[
[
[
[
[
[
[
[
A4 : ring.ty pe = e
−1
76540123
ϕ
A5 : spore.print.color = r
−1
c
c
c
c
c
c
c
c
c
c
c
c
c
c
A6 : population = c
−1
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
A7 : habitat ∈ {g, m,u,d, p, l}
−1
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
A8 : habitat = w
1
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
This model has an accuracy of 100%. From it, and
since attribute values in A2 and A3, as well as the val-
ues in A7 and A8 are auto-exclusive, we used proposi-
tions A1, A2, A3, A4, A5, A6 and A7 to define a new
data set. This new data set was enriched with new
negative cases by introducing, for each original case,
a new one where the truth value of each attribute was
multiplied by 0.5. For instance, the ”eatable” mush-
room case:
(A1=0, A2=1, A3=0, A4=0, A5=0, A6=1, A7=0)
was used on the definition of a new ”poison” case
(A1=0, A2=0.5, A3=0, A4=0, A5=0, A6=0.5, A7=0)
This resulted in a convergence speedup and reduced
the occurrence of un-representable configurations.
When we applied our ”reverse engineering” algo-
rithm to the enriched data set, having as stopping cri-
terion the mean square error (mse) less than 0.003, the
method produced the model:
0 1 0 0 −1 0 1
0 1 0 1 0 0 −1
−1
−1
A2 ⊗ ¬A5 ⊗A7
A2 ⊗ A4 ⊗¬A7
1 1
0
i
1
⊕ i
2
This model codifies the proposition
(A2 ⊗ ¬A5 ⊗A7)⊕ (A2 ⊗ A4⊗ ¬A7)
and misses the classification of 48 cases. It has
99.41% accuracy and can be interpreted as the rule
for eatable mushrooms given by: ”a mushroom is
eatable if its odor=almond.OR.anise.OR.none and
spore.print.color=black.AND.habitat=NOT.waste or
ring.type=evanescent.AND.habitat=NOT.waste.”
More precise model can be produced, by restrict-
ing the stopping criteria. However, this in general,
produces more complex propositions and is more dif-
ficult to understand. For instance with a stopping cri-
terion mse < 0.002 the systems generated the below
species of gilled mushrooms in the Agaricus and Le-
piota Family. Each species is identified as definitely
edible, definitely poisonous, or of unknown edibility
and not recommended. This latter class was combined
with the poisonous one. The Guide clearly states that
there is no simple rule for determining the edibility of
a mushroom. However, we will try to find one using
the data set as a truth table.
The data set has 8124 instances defined using 22
nominally valued attributes presented in the table be-
low. It has missing attribute values, 2480, all for at-
tribute #11. 4208 instances (51.8%) are classified as
edible and 3916 (48.2%) are classified as poisonous.
An example of a known rule for edible mush-
rooms is:
odor=(almond.OR.anise.OR.none).AND.spore-print-color=NOT.green
gives 48 errors, or 99.41% accuracy on the whole
dataset
We used an unsupervised filter that converted all
nominal attributes into binary numeric attributes. An
attribute with k values was transformed into k binary
attributes. This produced a data set containing 111
binary attributes.
After the binarization we used the described
method to select relevant attributes for mushroom
classification by fixing a weak stoping criterion. As
a result, the method produced a model, with 100%
accuracy, depending on 23 binary attributes defined
by values of:
odor,gill.size,stalk.surface.above.ring, ring.type, spore.print.color.
We used the values assumed by these attributes to
produce a new data set. After 3 tries we selected the
model less complex:
A1 : bruises? = t
1
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
A2 : odor ∈ {a, l, n}
1
W
W
W
W
W
W
W
W
W
W
W
W
W
W
W
W
1
A3 : stalk.sur face.above.ring = k
−1
[
[
[
[
[
[
[
[
[
[
[
[
[
A4 : ring.type = e
−1
76540123
ϕ
A5 : spore.print.color = r
−1
c
c
c
c
c
c
c
c
c
c
c
c
c
c
A6 : population = c
−1
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
A7 : habitat ∈ {g, m, u, d, p, l}
−1
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
A8 : habitat = w
1
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
This model has an accuracy of 100%. From it, and
since attribute values in A2 and A3, as well as the val-
ues in A7 and A8 are auto-exclusive, we used proposi-
tions A1, A2, A3, A4, A5, A6 and A7 to define a new
data set. This new data set was enriched with new
negative cases by introducing, for each original case,
a new one where the truth value of each attribute was
multiplied by 0.5. For instance, the ”eatable” mush-
room case:
(A1=0, A2=1, A3=0, A4=0, A5=0, A6=1, A7=0)
was used on the definition of a new ”poison” case
(A1=0, A2=0.5, A3=0, A4=0, A5=0, A6=0.5, A7=0)
This resulted in a convergence speedup and reduced
the occurrence of un-representable configurations.
When we applied our ”reverse engineering” algo-
rithm to the enriched data set, having as stopping cri-
terion the mean square error (mse) less than 0.003, the
method produced the model:
0 1 0 0 −1 0 1
0 1 0 1 0 0 −1
−1
−1
A2 ⊗ ¬A5 ⊗ A7
A2 ⊗ A4 ⊗ ¬A7
1 1
0
i
1
⊕ i
2
This model codifies the proposition
(A2 ⊗ ¬A5 ⊗ A7) ⊕ (A2 ⊗ A4 ⊗ ¬A7)
and misses the classification of 48 cases. It has
99.41% accuracy and can be interpreted as the rule
for eatable mushrooms given by: ”a mushroom is
eatable if its odor=almond.OR.anise.OR.none and
spore.print.color=black.AND.habitat=NOT.waste or
ring.type=evanescent.AND.habitat=NOT.waste.”
More precise model can be produced, by restrict-
ing the stopping criteria. However, this in general,
produces more complex propositions and is more dif-
ficult to understand. For instance with a stopping cri-
terion mse < 0.002 the systems generated the below
model. It misses 32 cases, has an accuracy of 99.2%,
and it is easy to convert in a proposition.
model. It misses 32 cases, has an accuracy of 99.2%,
and it is easy to convert in a proposition.
0 0 0 −1 0 0 1
1 1 0 −1 0 0 0
0 0 0 0 0 0 1
0 1 0 0 −1 −1 1
1
−1
0
−1
¬A4 ⊕ A7
A1 ⊗ A2 ⊗¬A4
A7
A2 ⊗ ¬A5 ⊗¬A6 ⊗ A7
−1 0 1 0
1 −1 0 −1
1
0
¬i
1
⊕ i
3
i
1
⊗ ¬i
2
⊗ ¬i
4
1 −1
0
j
1
⊗ ¬ j
2
This NN can be used to interprete formula:
j
1
⊗ ¬ j
2
= (¬i
1
⊕ i
3
) ⊗ ¬(i
1
⊗ ¬i
2
⊗ ¬i
4
) = (¬(¬A4 ⊕A7) ⊕ A7) ⊗¬((¬A4 ⊕ A7) ⊗
¬(A1 ⊗ A2 ⊗¬A4) ⊗ ¬(A2 ⊗¬A5 ⊗ ¬A6 ⊗A7))) =
= ((A4 ⊗¬A7) ⊕ A7) ⊗((A4 ⊗ ¬A7) ⊕(A1 ⊗ A2 ⊗¬A4) ⊕ (A2 ⊗¬A5 ⊗ ¬A6 ⊗A7)))
Some times the algorithm converged to un-
representable configurations like the one presented
below, with 100% accuracy. The frequency of this
type of configurations increases with the increase of
required accuracy.
−1 1 −1 1 0 −1 0
0 0 0 1 1 0 −1
1 1 0 0 0 0 −1
0
1
0
i
1
un-representable
A4 ⊗ A5 ⊗¬A7
i
3
un-representable
1 −1 1
0
j
1
un-representable
Using rule R and selecting the best approximation in
data set to each un-representable formula, evaluated
in the data set, we have:
1. i
1
∼
0.9297
((¬A1 ⊗ A4) ⊕A2) ⊗ ¬A3 ⊗¬A6
2. i
3
∼
1.0
(A1 ⊕ ¬A7) ⊗A2
3. j
1
∼
0.9951
(i
1
⊗ ¬i
2
) ⊕ i
3
The extracted formula
α = (((((¬A1 ⊗ A4) ⊕A2) ⊗ ¬A3 ⊗¬A6) ⊗ ¬(A4⊗ A5 ⊗ ¬A7))⊕ ((A1 ⊕ ¬A7)⊗ A2)
is λ-similar, with λ = 0.9951 to the original NN. For-
mula α misses the classification for 40 cases. Note
that the symbolic model is stable, the bad perfor-
mance of i
1
representation do not affect the model.
The CNN structure can codify the dataset with
100% accuracy. Bellow we present a prefect descrip-
tion for edible mushrooms.
0 1 1 −1 −1 0 0
0 1 −1 0 −1 −1 0
0 1 −1 0 −1 1 −1
1 1 −1 −1 −1 1 1
−1
0
−1
3
1 1 1 1
0
This structure have by interpretation the rule for edi-
ble mushrooms:
(A2.and.A3.and.NOT (A4).and.NOT (A5)).or.
(A2.and.NOT (A3).and.NOT (A5).and.NOT (A6)).or.
(A2.and.NOT (A3).and.NOT (A5).and.A6.and.NOT (A7)).or.
(A1.and.A2.and.NOT (A3).and.NOT (A4).and.NOT (A5).and.A6.and.A7)
6 CONCLUSIONS AND FUTURE
WORK
This methodology to codify and extract symbolic
knowledge from a NN is very simple and efficient for
the extraction of comprehensible rules from medium-
sized data sets. It is, moreover, very sensible to at-
tribute relevance.
In the theoretical point of view it is particularly
interesting that restricting the values assumed by neu-
rons weights restrict the information propagation in
the network, thus allowing the emergence of patterns
in the neuronal network structure. For the case of lin-
ear neuronal networks, having by activation function
the identity truncate to 0 and 1, these structures are
characterized by the occurrence of patterns in neu-
ron configuration directly presentable as formulas in
Łukasiewicz logic.
Generated fuzzy rules might do a good approxi-
mation of the data, but often are not interpretable. In
your point of view the interpretability of such sym-
bolic rules are strictly related to the type of fuzzy
logic associated to the problem. When we applied
our method on the extraction of rules from truth ta-
bles, generated on Product logic or on G
¨
odel logic,
this rules were very dificulte to interprete. For the ex-
traction of knowledge from this types of fuzzy logic
extraction processed governed by appropriated logic
must be developed.
We are using this methodology for fuzzy regres-
sion tree generation. Where we use CNN for finding
slitting formulas in the algorithm pruning fase (Al-
gara, 2007).
Acknowledgements
I tanks Helder Pita for reading and commenting on
the manuscript. I acknowledge the support of the In-
stituto Superior de Engenharia de Lisboa and the
´
Area
Cientifica da Matem
´
atica.
REFERENCES
Algara, E. (2007). Soft Operators Decision Trees: Un-
certainty and stability related issues. Vom Fachbere-
ich Mathematik der Technischen Universitt Kaiser-
slautern zur Verleihung des Akademischen Grades
Doktor der Naturwissenschaften, 2007.
Amato, P., Nola, A., and Gerla, B. (2002). Neural networks
and rational łukasiewicz logic. IEEE Transaction on
Neural Networks, vol. 5 no. 6, (2002)506-510.
Andersen, T. and Wilamowski, B. (1995). A modified re-
gression algorithm for fast one layer neural network
training. World Congress of Neural Networks, Wash-
ington DC, USA, Vol. 1 no. 4, CA, (1995)687-690.
Battiti, R. (1992). Frist- and second-order methods
for learning between steepest descent and newton’s
method. Neural Computation, Vol. 4 no. 2, (1992)141-
166.
Bello, M. (1992). Enhanced training algorithms, and in-
tehrated training/architecture selection for multilayer
This NN can be used to interprete formula:
IJCCI 2009 - International Joint Conference on Computational Intelligence
14