method uses an RBF function to describe a Neural-
Fuzzy system. Figure 1 show the structure of the
RBF-NF model, where the input, rule-base (hidden
layer) and output layers can be identified. The
presented system can then be parametrically
optimised via a suitable function minimisation
algorithm.
2.3 Levenberg-Marquardt
Optimisation
The Levenberg-Marquardt (LM) algorithm is an
iterative technique that locates the minimum of a
multivariate function that is expressed as the sum of
squares of non-linear real-valued functions
(Levenberg, 1944). In this paper the RMSE between
the training data and the model predicted data is
used as the cost function to be minimised. The
presented data-mining workflow provides an
efficient and fast method for capturing numerical
data-based information and converting it to a
linguistic knowledge-base with a predictive
capability.
The next section describes how the RBF-NF
structure is exploited along with Fuzzy-|Entropy
measures to identify relevant to the process features.
3 FUZZY ENTROPY-BASED
FEATURE SELECTION
The presented method is based on two Fuzzy Logic
features: The Fuzzy Entropy and the Tagaki-
Sugeno-Kang (TSK) type (Takagi and Sugeno,
1985) of output layer for a NF system. Fuzzy-
Entropy is a measure of ‘fuzziness’, it allows the
quantification of how ‘fuzzy’ a value is when the
Fuzzy Inference System (FIS) is used.The TSK
output layer of an RBF-NF model is a linear
combination of its inputs (polynomial).The
hypothesis is that during model training, the values
of the output weights w
i
, for each rule, will increase
(absolute value) for the inputs (genes) that are more
‘influential’ in (contribute to) the model predictions.
One could analyse how the output weights change
on every training iteration, hence determine the
relevance of the corresponding inputs. This
relationship in terms of entropy strength is relative
to the genes, is measurable, and may be used to rank
the genes for a particular rule in the rule-base. In the
algorithmic process proposed here, the model is
trained for ‘N’ iterations, while at ‘n’ iterations
(n<N) the training can be ‘paused’ and the model
can be reviewed in terms of the gene ranking order.
Not all the rules in the rule-base contribute with
the same amount to the FIS. This is subject to the
‘input space’ of a particular gene. Therefore, the
ranking order that may be established as a result of
examining a single TSK rule is only relevant if the
corresponding rule has a high contribution to the
overall rule-base. This contribution can be
established via the use of Fuzzy-Entropy (FE), as a
measure of ‘fuzziness’. The FE is calculated for each
individual rule, and then a numerical ‘index’ is
developed to ‘adjust’ the significance of the ranking
of each individual rule. Finally, the overall ranking
of the genes is calculated by using the FE-adjusted
gene output weights.
In terms of the algorithmic process, Figure 2
summarises the gene feature selection. The first step
is to rank the output weights by rule in descending
order. The top ‘n’ genes are then selected and this
information is passed on to the following step (this
numerical threshold is process specific). The Fuzzy-
Entropy is then calculated for each model prediction.
The fuzzy entropy is defined using the concept of
membership function. In 1972, De Luca and Termini
defined Fuzzy Entropy Based on Shannon’s
functions and they introduced a set of properties for
which Fuzzy Entropy should satisfy them (Al-
Sharhan et al., 2001).
log
1
log
1
(2)
Where:
, ,
,
Once the entropy is calculated, Eq. 3 is applied.
The ‘B’ index reflects the significance of a gene
within a certain rule. The value of B is obtained for
each gene in all the rules.
/
∗
(3)
The output weight is adjusted by the significance of
a particular rule (proportional to the membership
degree, inversely proportional to the ‘fuzziness’).
After the rule-adjusted significance per gene is
calculated a new ranking order is then compiled.
The work presented in this paper is the first
report, to our knowledge, of a Fuzzy-Entropy
scheme applied to a RBF-NF modelling structure.
The resulting ranking of the genes directly
relates to their performance in the modelling
structure, is an iterative procedure – that can be
BIOINFORMATICS2013-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms
136