equal to FFI
y;1,2,...,q
- FFI
y;1,2,...,q-1
(where the two terms
of the subtraction represent the proportion of the
sum of squares of
Y
explained by the model
including
X
(q)
and not). The higher the threshold
value, the easier the procedure inhibits the entry of
new independent variables, because of the increases
in the fraction of the total variability which should
be explained.
Once
X
(q)
is selected, its originality is evaluated
through the so called tolerance T
q
=1-FFI
q;1,2,...,q-1
,
where FFI
q;1,2,...,q-1
represents the share of variability
of
X
(q)
explained by the q-1 independent variables
already in the model. The tolerance ranges between
0 and 1, depending on the degree of linear
correlation of
X
(q)
with the other variables;
therefore, only if T
q
exceeds a threshold between 0
and 1,
X
(q)
will become part of the model. A high
value of the threshold allows to select very original
variables, but it can also stop the process right from
the initial steps; on the contrary, a low value allows
most of the variables enter into the equation only if
they explain a significant fraction of variability of
Y
. The described procedure stops when none of the
variables not yet included in the equation may
introduce a significant contribution to the model, or
if none of the candidate variables to enter is
significantly original.
For an application of this procedure see
Montrone, Campobasso, Perchinunno and Fanizzi,
2011, which elaborates on data revealed by the EU-
SILC survey of 2006 regarding the perception of
poverty by Italian families. For this purpose, by
using the editor of Matlab, we generated a function
which requires, as input parameters, the matrices of
cores, left extremes and right extremes both of the
dependent and of the independent fuzzy variables.
A more accurate procedure provides the
possibility of eliminating at each iteration variables
already included in the model, whose explanatory
contribution is subrogated by the combination of the
independent variables introduced later.
In particular, unlike the procedure just described,
we can verify at each iteration that the explanatory
contribution of the variable
X
(i)
(i = 1, 2, ..q-1) is
still significant, once the candidate variable
X
(q)
is
inserted. In the q.th step such a contribution can be
measured by the reduction of FFI in the elimination
of the variable
X
(i)
from the model, equal to
FFI
y;1,2,...,q
- FFI
y;1,2,...,q (-i)
(where the two terms of the
subtraction represent the proportion of the sum of
squares of
Y
explained by the model including all
the variable
and without the variable
X
(i),
respectively). So, the variable
X
(i)
remains in the
model if the percentage of the sum of squares
explained by the model including all variables is
higher than the model without the variable
X
(i)
and
also arbitrary threshold value.
7 CONCLUSIONS
In this work we first explicit the expressions of the
estimated parameters of a multivariate fuzzy
regression model with a fuzzy asymmetric intercept.
Such an intercept is more appropriate than a non-
fuzzy on, as it is to be estimated by the average
value of the dependent variable (which is also
fuzzy) when the independent variables equal zero.
Moreover we verify that the sum of squares of
the dependent variable consists simply in the
regression sum of squares and the residual one, like
it happens in the classic OLS estimation procedure,
only when the intercept is fuzzy asymmetric
triangular. Conversely, when the intercept is
symmetric (both fuzzy and not), the analysis of the
forecasting capability of the model is more difficult.
This happens because of the presence of two
additional components of the sum of squares: the
first one which is related to the difference between
the theoretical and the empirical average values of
the dependent variable, the second one which is
residual in nature and is characterized by an
uncertain sign.
The selection of the most significant independent
variables in a fuzzy regression model presents
computational difficulties due to the large number of
potential hyperplanes to be tested. We propose to
overcome such difficulties through a stepwise
procedure, based on a fuzzy version of the R
2
index.
In each step a single variable is included between
the starting ones,
according to two basic criteria: its
explanatory contribution to the model and its
originality with respect to the other variables already
included
in the model.
A more accurate procedure provides the
possibility of eliminating at each iteration variables
already included in the model, whose explanatory
contribution is subrogated by the combination of the
independent variables introduced later.
The forecasting capability of the proposed fuzzy
regression model has been successfully verified in a
recent application to data revealed by the EU-SILC
survey of 2006, regarding the perception of poverty
by Italian families. In that circumstance we have
used the editor of Matlab and, in particular, we have
A STEPWISE PROCEDURE TO SELECT VARIABLES IN A FUZZY LEAST SQUARE REGRESSION MODEL
425