A STEPWISE PROCEDURE TO SELECT VARIABLES
IN A FUZZY LEAST SQUARE REGRESSION MODEL
Francesco Campobasso and Annarita Fanizzi
Department of Statistical Sciences “Carlo Cecchi”, University of Bari, Bari, Italy
Keywords: Fuzzy least square regression, Multivariate generalization, Asymmetric fuzzy intercept, Total sum of
squares, Goodness of fit, Stepwise procedure.
Abstract: Fuzzy regression techniques can be used to fit fuzzy data into a regression model. Diamond treated the case
of a simple model introducing a metrics into the space of triangular fuzzy numbers. In previous works we
provided some theoretical results about the estimates of a multiple regression model with a non-fuzzy
intercept; in this paper we show how the sum of squares of the dependent variable can be decomposed in
exactly the same way as the classical OLS estimation procedure only when the intercept is fuzzy
asymmetric. Such a decomposition allows us to introduce a stepwise procedure which simplifies, in terms of
computational, the identification of the most significant independent variables in the model.
1 INTRODUCTION
Modalities of quantitative variables are commonly
given as exact single values, although sometimes
they cannot be precise. The imprecision of
measuring instruments and the continuous nature of
some observations, for example, prevent researcher
from obtaining the corresponding true values.
On the other hand qualitative variables are
commonly expressed using common linguistic
terms, which also represent verbal labels of sets with
uncertain borders.
The appropriate way to manage such an
uncertainty of observations is provided by using
fuzzy numbers.
In 1988 P. M. Diamond introduced a metric onto
the space of triangular fuzzy numbers and derived
the expression of the estimated coefficients in a
simple fuzzy regression of an uncertain dependent
variable on a single uncertain independent variable.
Starting from a multivariate generalization of this
regression, we provided in previous works some
results on the decomposition of the deviance of the
dependent variable according to Diamond’s metric.
2 THE FUZZY LEAST SQUARE
REGRESSION
A triangular fuzzy number
TRL
)x,x,x(X
~
= for the
variable X is characterized by a function
[
]
0,1X:μ
X
~
, like the one represented in Fig. 1,
that expresses the membership degree of any
possible value of X to
X
~
.
The accumulation value x is considered the core
of the fuzzy number, while
xx
R
=ξ
and
L
xx =ξ
are considered the left spread and the right spread
respectively.
Figure 1: Representation of a triangular fuzzy number.
Note that x belongs to X
~
with the highest degree
(equal to 1), while the other values included between
417
Campobasso F. and Fanizzi A..
A STEPWISE PROCEDURE TO SELECT VARIABLES IN A FUZZY LEAST SQUARE REGRESSION MODEL.
DOI: 10.5220/0003720504170426
In Proceedings of the International Conference on Evolutionary Computation Theory and Applications (FCTA-2011), pages 417-426
ISBN: 978-989-8425-83-6
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
the left extreme
L
x and the right extreme
R
x
belong to
X
~
with a gradually lower degree.
The set of triangular fuzzy numbers is closed
under addition: given two triangular fuzzy numbers
TRL
)x,x,x(X
~
=
and
TRL
)y,y(y,Y
~
=
, their sum
Z
~
is still a triangular fuzzy number
TRRLL
)yx,yx,yx(Y
~
X
~
Z
~
+++=+=
. Moreover
the opposite of a triangular fuzzy number
TRL
)x,x,x(X
~
=
is
TLR
)x,x,x(X
~
=
.
It follows that, given n fuzzy numbers
TRiLiii
)x,x,x(X
~
=
, i =1, 2, .., n, their average is
T
RiLiii
n
x
,
n
x
,
n
x
n
X
~
X
=
= .
Diamond (1988) introduced a metrics onto the
space of triangular fuzzy numbers; according to this
metrics, the squared distance between
X
~
and Y
~
is
()
==
2
TRLTRL
2
)y,y(y,,)x,x,x(d)Y
~
,X
~
(d
2
RR
2
LL
2
)yx()yx()yx( ++
.
The same Author treated the fuzzy regression
model of a dependent variable
Y
~
on a single
independent variable
X
~
, which can be written as
Y
~
= a + b X
~
, a, b IR ,
when the intercept a is non-fuzzy, as well as
Y
~
= A
~
+b X
~
a, b IR ,
when the intercept
TRL
)a,a(a,A
~
=
is fuzzy, where
it is
= aa
L
,
γ
= aa
R
and γ , γ
> 0.
The expression of the corresponding parameters
is derived from minimizing the sum
2*
ii
)Y
~
,Y
~
(d
of
the squared distances between theoretical and
empirical values in n observed units of the fuzzy
dependent variable
Y
~
with respect to a and b.
Such a sum takes different forms according to
the signs of the coefficient b, as the product of a
fuzzy number
TRL
)x,x,x(X
~
=
and a real number k
depends on whether the latter is positive or negative.
by subtracting the right spread from the core.
Diamond demonstrated that the optimization
problem has a unique solution under certain
conditions.
In previous works we provided some theoretical
results about the estimates of the regression
coefficients and about the decomposition of the sum
of squares of the dependent variable (Campobasso,
Fanizzi and Tarantini, 2009) in a multiple regression
model. In particular we treated the case of a non-
fuzzy intercept, as well as the case of a fuzzy
intercept, which seems more appropriate
(Campobasso and Fanizzi, 2011) for some reasons
which will be clearer later.
3 A MULTIVARIATE
GENERALIZATION OF THE
REGRESSION MODEL
3.1 A Generalization of the Model
Including a Non-fuzzy Intercept
Let’s assume to observe a fuzzy dependent variable
TRiLiii
)y,y,(yY
~
=
and two fuzzy independent
variables,
TRiLiii
)x,x,x(X
~
=
and
TRiLiii
)z,z,(zZ
~
= ,
on a set of n units. The linear regression model is
given by
i
Y
~
*= a +b
i
X
~
+c
i
Z
~
, i=1,2, ...,n; a,b,c IR.
The corresponding parameters are determined by
minimizing the sum of Diamond’s distances
between theoretical and empirical values of the
dependent variable
++
2
iii
)Z
~
cX
~
ba,Y
~
d(
(1)
respect to a, b and c. As we stated above, such a
sum assumes different expressions according to the
signs of the regression coefficients b and c. This
generates the following four cases
Case 1
: b>0, c>0
=
++
2
iii
)Z
~
cX
~
ba,Y
~
d(
])czbxay(
)czbxay()czbxay[(
2
RiRiRi
2
LiLiLi
2
iii
+
++=
Case 2
: b<0, c>0
=
++
2
iii
)Z
~
cX
~
ba,Y
~
d(
])czbxay(
)czbxay()czbxay[(
2
RiLiRi
2
LiRiLi
2
iii
+
++=
Case 3
: b>0, c<0
=
++
2
iii
)Z
~
cX
~
ba,Y
~
d(
])czbxay(
)czbxay()czbxay[(
2
LiRiRi
2
RiLiLi
2
iii
+
++=
Case 4
: b<0, c<0
=
++
2
iii
)Z
~
cX
~
ba,Y
~
d(
])czbxay(
)czbxay()czbxay[(
2
LiLiRi
2
RiRiLi
2
iii
+
++=
FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications
418
Let’s consider, as an example, case 3 and let’s
express it in matricial terms. The expression to be
minimized is given by
222
()
LL RR
GyXyX yX
ββ β β
=− + + =
(2)
()'()( )'( )
()'()
LL LL
RR RR
yX yX y X y X
yX yX
β
βββ
ββ
=− + +
+−
where
y = [y
i
], is the n-dimensional vector of cores of the
dependent variable;
y
L
= [
Li
y
] and y
R
= [
Ri
y
] are the n-dimensional
vectors of lower extremes and upper extremes of the
dependent variable respectively;
X is the n×3 matrix of cores of the independent
variables, formed by vectors 1, x = [ x
i
], z = [ z
i
];
X
L
is the n×3 matrix of lower bounds of the
independent variables, formed by vectors 1, x
L
=
[
Li
x], z
R
= [
Ri
z];
X
R
is the n×3 matrix of upper bounds of the
independent variables (analogous to
X
L
), formed by
vectors
1, x
R
, z
L
;
β is the vector (a, b, c) '.
The estimates of the regression coefficients are
derived from minimizing G(
β) with respect to β i.e.
from seeking the solutions of the system
0]'''[]'''[
=
++
++
RRLLRRLL
XyXyXyXXXXXX β
and in particular we obtain
]'''[]'''[
1
RRLLRRLL
yXyXyXXXXXXX ++++=
β
.
Similarly to OLS estimation procedure, the
optimization problem admits a single and finite
solution if
]'''[
RRLL
XXXXXX ++
is invertible and the
hessian matrix is definite positive.
The found solution
β
*
=(a
*
, b
*
, c
*
)', is admissible
if the signs of the regression coefficients are
coherent with basic assumptions (b >0, c <0).
In the remaining three cases the expression (2) to
be minimized is obtained after replacing
z
R
by z
L
in
X
L
and z
L
by z
R
in X
R
(case 1), x
L
by x
R
and z
R
by
z
L
in X
L
and also x
R
by x
L
and z
L
by z
R
in X
R
(case
2), x
L
by x
R
in X
L
and x
R
by x
L
in X
R
(case 4)
respectively.
The optimum solution corresponds to that
(admissible) one which makes minimum (1) among
all.
The generalization of such a procedure to the
case of several independent variables is immediate
and that the number of solutions to analyse, in order
to identify the optimum one, growths exponentially
with the considered number of variables. For
example, if the model includes k independent
variables, 2
k
possible cases must be taken into
account, which derive from combining the signs of
the regression coefficients.
3.2 A Generalization of the Model
Including a Fuzzy Intercept
Now we analyze an extension of the model with a
fuzzy intercept, which seems more appropriate than
the non-fuzzy one as it expresses the average value
of the dependent variable (which is also
fuzzy) when
the independent variables equal zero.
For this purpose we start from the results obtained
by Diamond in the case of the univariate regression
model with a fuzzy intercept.
3.2.1 The Univariate Model
Let’s regress, for example, the dependent variable
TRiLiii
)y,y,(yY
~
=
on a single independent variable
TRiLiii
)x,x,x(X
~
=
in a set of n units. If we
consider a symmetric fuzzy
intercept
TRL
)a,a(a,A
~
=
, where
γ= aa
L
,
γ+= aa
R
and
γ
> 0 (if γ = 0, A
~
would be no more fuzzy), the
model assumes the following expression:
i
Y
~
*
=
i
X
~
bA
~
+
i = 1, 2, ..., n; a, b IR .
The fuzzy regression parameters are determined
by minimizing the sum of the squared Diamond’s
distances between theoretical and empirical fuzzy
values of the dependent variable
+
2
ii
)Y
~
,X
~
bA
~
d(
respect to a, b and
γ.
The function to minimize assumes different
expressions according to the sign of the regression
coefficients b. Supposing that b > 0, the estimates of
a,b and
γ are obtained as solutions a
*
, b
*
and γ
*
of
the system of equations
[]
++++=
=γ
++=
=
+++
γ+++
.)xxb(xyyy
3
1
na
)]xx(b)yy[(2n
)xyxyxy(
)xxx(b)xx()xx(xa
RiLiiRiLii
LiRiLiRi
RiRiLiLiii
2
Ri
2
Li
2
iLiRiRiLii
Otherwise, supposing b<0, the estimates of a, b
and
γ are obtained as solutions a
*
, b
*
and γ
*
of the
system of equations
[]
iLiRi RiLi
22 2
iLiRi iLiRi
a(xx x) (x x)
()( )
2n [( ) ( )]
1
na yyyb(xxx).
3
i LiRi iiLiRiRiLi
Ri Li Ri Li
bxxx xyyxyx
yy bxx
γ
γ
++ +
++ = + +
=−+
=++++
∑∑
A STEPWISE PROCEDURE TO SELECT VARIABLES IN A FUZZY LEAST SQUARE REGRESSION MODEL
419
As Diamond shows (1988), the solution to such a
problem of minimization exists and is unique if the
following conditions occur simultaneously:
a)
either b
*
< 0 or b
*
> 0;
b)
0)yy(
n
1
)yy()xx(
n
1
)xx(
LiRiLiRiLiRiLiRi
;
c)
b
*
> b
*.
3.2.2 The Multivariate Model
Now we generalize the regression model with a
fuzzy intercept to the case of more than a single
independent variable.
Assuming to regress a dependent variable
TRiLiii
)y,y,(yY
~
=
on two independent variables
TRiLiii
)x,x,x(X
~
=
and
TRiLiii
)z,z,(zZ
~
= in a set
of n units, the linear regression model including a
fuzzy asymmetric intercept
TRL
)a,a(a,A
~
= , where
γ= aa
L
,
γ
= aa
R
and
γ
, γ
> 0 (if
γ
= γ =
0, A
~
would be no more fuzzy), assumes the
following expression:
*
i
Y
~
=
A
~
+b
i
X
~
+c
i
Z
~
, i = 1, 2, ..., n; a, b, c IR .
Note that the asymmetric intercept is more
appropriate the symmetric one, which evidently fits
the data in a less efficient way.
The corresponding estimates of the parameters
are again determined by minimizing the sum of the
squared Diamond’s distances between empirical
and theoretical values of the dependent variable
++
2
iii
)Z
~
cX
~
bA
~
,Y
~
d(
(3)
respect to a, b, c,
γ
and
. The function to
minimize assumes different expressions according
to the signs of the regression coefficients b and c.
Case 1: b>0, c>0
=
++
2
iii
)Z
~
cX
~
bA
~
,Y
~
d(
])czbxay(
)czbxay()czbxay[(
2
RiRiRRi
2
LiLiLLi
2
iii
+
++=
Case 2: b<0, c>0
=
++
2
iii
)Z
~
cX
~
bA
~
,Y
~
d(
])czbxay(
)czbxay()czbxay[(
2
RiLiRRi
2
LiRiLLi
2
iii
+
++=
Case 3: b>0, c<0
=
++
2
iii
)Z
~
cX
~
bA
~
,Y
~
d(
])czbxay(
)czbxay()czbxay[(
2
LiRiRRi
2
RiLiLLi
2
iii
+
++=
Case 4: b<0, c<0
=
++
2
iii
)Z
~
cX
~
bA
~
,Y
~
d(
])czbxay(
)czbxay()czbxay[(
2
LiLiRRi
2
RiRiLLi
2
iii
+
++=
Let’s consider, as an example, case 3 and let’s
express it in matricial terms. The expression to be
minimized is given by
22 2
()
LL RR
GyXyX yX
ββ β β
=
−+ + =
(4)
()'()( )'( )
()'()
LL LL
RR RR
yX yX y X y X
yX yX
ββ β β
ββ
=
−−+ +
+−
where
y = [y
i
], is the n-dimensional vector of cores of the
dependent variable;
y
L
= [
Li
y
] and y
R
= [
Ri
y
] are the n-dimensional
vectors of lower extremes and upper extremes of the
dependent variable respectively;
X is the n×5 matrix of cores of the independent
variables, formed by vectors
1, x = [ x
i
], z = [ z
i
]
and two vectors
0;
X
L
is the n×5 matrix of lower bounds of the
independent variables, formed by vectors
1, x
L
=
[
Li
x ], z
R
= [
Ri
z ] and -1, 0;
X
R
is the n×5 matrix of upper bounds of the
independent variables (analogous to
X
L
), formed by
vectors
1, x
R
, z
L
and 0, 1;
β is the vector (a, b, c,
γ
,
) '.
The estimates of the regression coefficients are
derived from minimizing G(β) with respect to β
i.e.
from seeking the solutions of the system
0]'''[]'''[
=
++
+
+
RRLLRRLL
XyXyXyXXXXXX
β
and in particular we obtain
]'''[]'''[
1
RRLLRRLL
yXyXyXXXXXXX ++++=
β
.
Similarly to OLS estimation procedure, the
optimization problem admits a single and finite
solution if
]'''[
RRLL
XXXXXX
+
+
is invertible and the
hessian matrix is definite positive.
The found solution β
*
=(a
*
, b
*
, c
*
,
γ
*
,
*
)', is
admissible if the signs of the regression coefficients
are coherent with basic assumptions, that is b >0, c
<0 and
,
> 0.
In the remaining three cases the expression (4) to
be minimized is obtained after appropriately
FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications
420
replacing the vectors of the left and right extremes in
the matrices as described above, according to the
case considered. The optimum solution corresponds
to that (admissible) one which makes minimum (3)
among all.
When the intercept is symmetric, we estimate a
parameter less than the previous model, because the
spreads left and right coincide (Campobasso and
Fanizzi, 2011). Note that the matrices
X, X
L
and X
R
,
relative to independent variables, and the vector of
parameters β
change their expression. In particular
we have that
X is the n×4 matrix of cores of the independent
variables, formed by vectors
1, x = [ x
i
], z = [ z
i
]
and
0;
X
L
is the n×4 matrix of lower bounds of the
independent variables, formed by vectors
1, x
L
=
[
Li
x ], z
R
= [
Ri
z ] and -1;
X
R
is the n×4 matrix of upper bounds of the
independent variables (analogous to
X
L
), formed by
vectors
1, x
R
, z
L
and 1;
β is the vector (a, b, c,
γ
) '.
4 DECOMPOSITION OF THE
TOTAL SUM OF SQUARES OF
THE DEPENDENT VARIABLE
In this section two important theoretical results will
be demonstrated: the first one regards the inequality
between theoretical and empirical averages of the
fuzzy dependent variable (unlike in the classical
OLS estimation procedure); the second one regards
the decomposition of the total sum of squares of the
dependent variable, which involves other two
additive components besides the regression and the
residual sum of squares.
4.1 The Model Including a Non-fuzzy
Intercept
Let’s consider, only for example, the sum of
Diamond’s distances between theoretical and
empirical values of the dependent variable in the
case 3:
=
++
2
iii
)Z
~
cX
~
ba,Y
~
d(
])czbxay(
)czbxay()czbxay[(
2
LiRiRi
2
RiLiLi
2
iii
+
++=
Setting equal to 0 the derivate of
++
2
iii
)Z
~
cX
~
ba,Y
~
d(
respect to a, b and c, we can
obtain the following system of equations:
=+
++
=+
++
=+
++
0]z)czbxay(
z)czbxay(z)czbxay[(2
0]x)czbxay(
x)czbxay(x)czbxay
[(2
0)]czbxay(
)czbxay()czbxay([2
LiLiRiRi
RiRiLiLiiiii
RiLiRiRi
LiRiLiLiiiii
LiRiRi
RiLiLiiii
Such a system can be written as
+
+
=
+++
+
++
+++
+
+
=
+++
+
+++
++
++=
+++
+
+++++
LiRiRiLi
ii
LiLiRi
RiRiLi
iii
RiRiLiLi
ii
RiLiRi
LiRiLi
iii
RiLi
i
LiRi
RiLiii
zyzyzyz)czbxa(
z)czbxa(z)czbxa(
xyxyxyx)czbxa(
x)czbxa(x)czbxa(
)yyy()czbxa(
)czbxa()czbxa(
Recalling that the theoretical values of the fuzzy
dependent variable are
ii
*
i
czbxay ++=
,
RiLi
*
Li
czbxay ++=
and
LiRi
*
Ri
czbxay ++=
, we obtain
** *
** *
** *
()()
iLiRi iLiRi
i i Li Li Ri Ri i i Li Li Ri Ri
L
i i Li Ri Ri Li i i Li Ri Ri Li
yy y yy y
yx yx yx yx yx yx
yz yz yz yz yz yz
++ = ++
++= ++
++= ++
∑∑
∑∑
(5)
The first equation of the system (5) shows that
the total sum of lower extremes, cores and upper
extremes of the theoretical values of the dependent
variable coincides with the same amount referred to
the empirical values. This equation does not allow us
to say that theoretical and empirical averages of the
fuzzy dependent variable coincide.
Let’s examine how the total sum of squares of
dependent variable
++= ])yy()yy()yy[(SSTot
2
R
Ri
2
L
Li
2
i
can be decomposed according to Diamond’s metric.
Adding and subtracting the corresponding
theoretical value within each square and developing
all the squares, the total deviance can be expressed
as:
=++
++++=
])yyyy(
)yyyy()yyyy[(SSTot
2
R
*
Ri
*
RiRi
2
L
*
Li
*
LiLi
2*
i
*
i
i
)].yy)(yy(2)yy()yy(
)yy)(yy(2)yy()yy(
)yy)(yy(2)yy()yy([
R
*
Ri
*
Ri
Ri
2
R
*
Ri
2*
Ri
Ri
L
*
Li
*
Li
Li
2
L
*
Li
2*
Li
Li
*
i
*
i
i
2*
i
2*
i
i
+++
++++
+++
=
Adding and subtracting the theoretical average
values of the lower extremes, of the cores and of the
upper extremes of the dependent variable within
A STEPWISE PROCEDURE TO SELECT VARIABLES IN A FUZZY LEAST SQUARE REGRESSION MODEL
421
each square and developing all the squares, the
previous expression becomes
=+++
+++
+++++
+++
=
)]yy)(yy(2)yyyy(
)yy()yy)(yy(2
)yyyy()yy()yy)(yy(2
)yyyy()yy([ SSTot
R
*
Ri
*
Ri
Ri
2
R
*
R
*
R
*
Ri
2*
Ri
Ri
L
*
Li
*
Li
Li
2
L
*
L
*
L
*
Li
2*
Li
Li
*
i
*
i
i
2***
i
2*
i
i
)]yy)(yy(2
)yy)(yy(2)yy()yy(
)yy()yy)(yy(2)yy)(yy(2
)yy()yy()yy()yy)(yy(2
)yy)(yy(2)yy()yy()yy([
R
*
Ri
*
Ri
Ri
R*
R
*
R
*
Ri
2
R
*
R
2*
R
*
Ri
2*
Ri
Ri
L
*
Li
*
Li
Li
L
*
L
*
L
*
Li
2
L
*
L
2*
L
*
Li
2*
Li
Li
*
i
*
i
i
***
i
2*2**
i
2*
i
i
+
++++
++++
+++++
++++
=
where:
++=
= ])yy()yy()yy[()Y,Y
~
(dSSReg
2
R
*
Ri
2
L
*
Li
2*
i
2*
i
represents the regression sum of squares, while
++=
= ])yy()yy()yy[()Y
~
,Y
~
(dSSsRe
2*
Ri
Ri
2*
Li
Li
2*
i
i
2*
ii
represents the residual sum of squares, and
])yy()yy()yy[(n=)Y,Y(nd
2
R
*
R
2
L
*
L
2*2*
++
represents the distance between theoretical and
empirical average values of dependent variable.
Synthetically the expression of Tot SS can be
written as:
Tot SS = Reg SS + Res SS +
2*
)Y,Y(nd +
η
where:
)].yy)(yy()yy)(yy(
)yy)(yy([2)]yy)(yy(
)yy)(yy()yy)(yy[(2
R
*
Ri
*
Ri
Ri
L
*
Li
*
Li
Li
*
i
*
i
i
R
*
R
*
R
*
Ri
L
*
L
*
L
*
Li
***
i
++
+
+
++=η
As the sums of deviations of each component
from its average equal zero, then it is
0)]yy)(yy()yy)(yy()yy)(yy[(
R
*
R
*
R
*
RiL
*
L
*
L
*
Li
***
i
=
++
and the amount η is reduced to
].y)yy(y)yy(
y)yy(y)yy(y)yy(y)yy[(2
)]yy)(yy(
)yy)(yy()yy)(yy([2
R
*
Ri
Ri
*
Ri
*
Ri
Ri
L
*
Li
Li
*
Li
*
Li
Li
*
i
i
*
i
*
i
i
R
*
Ri
*
Ri
Ri
L
*
Li
*
Li
Li
*
i
*
i
i
+
++
=
=+
++
=η
Moreover, as it is
ii
*
i
czbxay ++=
,
RiLi
*
Li
czbxay ++=
and
LiRi
*
Ri
czbxay ++=
, it is
also
0y)yy(2y)yy(2y)yy(2
*
Ri
*
Ri
Ri
*
Li
*
Li
Li
*
i
*
i
i
=
+
+
.
By replacing expressions of the theoretical
values in the latter equation, we obtain
.]y)yy(y)yy(y)yy[(2
)]xyxyxy(c)zyzyzy(c
)xyxyxy(b)xyxyxy(b
)yyy(a)yyy(a[2
RL
*
Ri
Ri
*
Li
Li
*
i
i
Ri
*
Ri
Li
*
Li
i
*
i
LiRiRiLiii
Ri
*
Ri
Li
*
Li
i
*
i
RiRiLiLiii
*
Ri
*
Li
*
i
RiLii
++
++++++
++++++
+++++=η
According to the condition (5) the last expression
can be reduced to
.]y)yy(y)yy(y)yy[(2
RL
*
Ri
Ri
*
Li
Li
*
i
i
++=η
Note that, if the residual sum of squares equals
zero, also η and
2*
)Y,Y(d equal zero, because
theoretical and empirical average values of the
dependent variable coincide for each observation.
Therefore:
- if the regression sum of squares equals zero, then
the model has no forecasting capability, because the
sum of the components of the i-th theoretical value
equals the sum of the components of the empirical
average value (i = 1 ,..., n). Actually it is for each i
++=
++
RiLii
*
Ri
*
Li
*
i
yyyyyy =>
=>
RL
*
Ri
*
Li
*
i
ynynynnynyny ++=++ =>
=>
RL
*
Ri
*
Li
*
i
yyyyyy ++=++ ;
-
if the residual sum of squares equals zero, the
relationship between the dependent variable and the
independent ones is well represented by the
estimated model. In this case, the total sum of
squares is entirely explained by the regression sum
of squares.
4.2 The Model Including a Fuzzy
Intercept
Let’s consider, only for example, the sum of
Diamond’s distances between theoretical and
empirical values of the dependent variable in the
case 3 for a model with fuzzy intercept:
=
++
2
iii
)Z
~
cX
~
bA
~
,Y
~
d(
])czbxay(
)czbxay()czbxay[(
2
LiRiRRi
2
RiLiLLi
2
iii
+
++=
By minimizing such a quantity with respect to a,
b, c,
γ
and
γ
(remember that
γaa
L
=
and
γaa
R
+
=
) we can obtain the following system of
equations
FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications
422
=γ
=γ+
=γ+
+γ++
=γ+
+γ++
=γ+
+γ++
0)czbxay(2
0)czbxay(2
0]z)cz
bxay(
z)czbxay(z)czbxay[(2
0]x)czbxay(
x)czbxay(x)czbxay[(2
0)]czbxay(
)czbxay()czbxay([2
LiRiRi
RiLiLi
LiLiRiRi
RiRiLiLiiiii
RiLiRiRi
LiRiLiLiiiii
LiRiRi
RiLiLiiii
Such a system can be written as
=
++γ+
=
++γ
+
+
=
++γ++
+
++γ
+++
+
+
=
++γ++
+
++γ+
++
++=
++γ++
+
++γ+++
RiLiRi
LiRiLi
LiRiRiLi
ii
LiLiRi
RiRiLi
iii
RiRiLiLi
ii
RiLiRi
LiRiLi
iii
RiLi
i
LiRi
RiLiii
y)czbxa(
y)czbxa(
zyzyzyz)czbxa(
z)czbxa(z)czbxa(
xyxyxyx)czbxa(
x)czbxa(x)czbxa(
)yyy()czbxa(
)czbxa()czbxa(
Recalling that the theoretical values of the fuzzy
dependent variable are
ii
*
i
czbxay ++=
,
RiLi
*
Li
czbxay ++γ=
and
LiRi
*
Ri
czbxay ++γ+=
respectively, we obtain
=
=
+
+
=
=
∑∑
+
+
+
+
=
=
∑∑
+
+
++=
++
Ri
*
Ri
Li
*
Li
LiRiRiLi
ii
Li
*
Ri
Ri
*
Lii
*
RiRiLiLi
ii
Ri
*
Ri
Li
*
Lii
*
RiLi
i
*
Ri
*
Li
*
yy
yy
zyzyzy
zyzyzy
xyxyxy
xyxyxy
)yyy()yyy(
i
i
i
(6)
The first equation shows that the total sum of
cores and extremes of the theoretical values of the
dependent variable coincides with the same amount
referred to the empirical values. The combination of
the first equation with the last two allows us to state
that theoretical and empirical values of the average
fuzzy dependent variable coincide, like it happens in
the classic OLS estimation procedure.
Let’s examine how the total sum of squares of
dependent variable can be decomposed according to
Diamond’s metric:
++= ])yy()yy()yy[(SSTot
2
R
Ri
2
L
Li
2
i
.
Adding and subtracting the corresponding
theoretical value within each square and developing
all the squares, the total deviance can be expressed
as:
=++
++++=
])yyyy(
)yyyy()yyyy[(SSTot
2
R
*
Ri
*
RiRi
2
L
*
Li
*
LiLi
2*
i
*
i
i
)].yy)(yy(2)yy()yy(
)yy)(yy(2)yy()yy(
)yy)(yy(2)yy()yy([
R
*
Ri
*
Ri
Ri
2
R
*
Ri
2*
Ri
Ri
L
*
Li
*
Li
Li
2
L
*
Li
2*
Li
Li
*
i
*
i
i
2*
i
2*
i
i
+++
++++
+++
=
Adding and subtracting the theoretical average
values of the lower extremes, of the cores and of the
upper extremes of the dependent variable within
each square and developing all the squares, the
previous expression becomes
=+++
+++
+++++
+++
=
)]yy)(yy(2)yyyy(
)yy()yy)(yy(2
)yyyy()yy()yy)(yy(2
)yyyy()yy([ SSTot
R
*
Ri
*
Ri
Ri
2
R
*
R
*
R
*
Ri
2*
Ri
Ri
L
*
Li
*
Li
Li
2
L
*
L
*
L
*
Li
2*
Li
Li
*
i
*
i
i
2***
i
2*
i
i
)]yy)(yy(2
)yy)(yy(2)yy()yy(
)yy()yy)(yy(2)yy)(yy(2
)yy()yy()yy()yy)(yy(2
)yy)(yy(2)yy()yy()yy([
R
*
Ri
*
Ri
Ri
R*
R
*
R
*
Ri
2
R
*
R
2*
R
*
Ri
2*
Ri
Ri
L
*
Li
*
Li
Li
L
*
L
*
L
*
Li
2
L
*
L
2*
L
*
Li
2*
Li
Li
*
i
*
i
i
***
i
2*2**
i
2*
i
i
+
++++
++++
+++++
++++
=
where:
++=
= ])yy()yy()yy[()Y,Y
~
(dSSReg
2
R
*
Ri
2
L
*
Li
2*
i
2*
i
represents the regression sum of squares, while
++=
= ])yy()yy()yy[()Y
~
,Y
~
(dSSsRe
2*
Ri
Ri
2*
Li
Li
2*
i
i
2*
ii
represents the residual sum of squares. Moreover,
according to the conditions (6), it is
0)]yy)(yy(2)yy)(yy(2)yy(
)yy)(yy(2)yy)(yy(2)yy(
)yy)(yy(2)yy)(yy(2)yy([
R
*
R
*
R
*
RiR
*
Ri
*
Ri
Ri
2
R
*
R
L
*
L
*
L
*
LiL
*
Li
*
Li
Li
2
L
*
L
***
i
*
i
*
i
i
2*
=+++
+++++
+++
Therefore the expression of the total sum of
squares of the dependent variable can be reduced to
SSsReSSgReSSTot +
=
.
Ultimately the total sum of squares consists only
of two addends, the regression sum of square and the
residual one, like in the classic OLS estimation
procedure, when the intercept has the same form of
the dependent variable.
Note that, when the intercept has not the same
form of the dependent variable, theoretical and
empirical average values of the latter do not coincide
for each observation; rather the total sum of lower
extremes, cores and upper extremes of the
theoretical values coincides with the same amount
referred to the empirical values:
A STEPWISE PROCEDURE TO SELECT VARIABLES IN A FUZZY LEAST SQUARE REGRESSION MODEL
423
=
+
+
=
+
+
+
+
=
+
+
+
+=+
+
)yy()yy(
zyzyzyzyzyzy
xyxyxyxyxyxy
)yyy()yyy(
Li
*
Li
Ri
*
Ri
LiRiRiLi
ii
Li
*
Ri
Ri
*
Lii
*
i
RiRiLiLi
ii
RiRiLi
*
Lii
*
i
RiLi
i
*
Ri
*
Li
*
i
In this case the total sum of squares of the
dependent variable consists of two other components
in addition to the regression sum of square and the
residual one: the first is residual in nature and is
characterized by an uncertain sign, the second is
equal to n times the distance between theoretical and
empirical average values of the dependent variable.
5 A FUZZY MODEL FIT INDEX
We have just demonstrated that the total sum of
squares of the dependent variable consists only of
two addends, the regression sum of square and the
residual one, when the intercept is fuzzy
asymmetric. This is because theoretical and
empirical average values of the dependent variable
coincide and, therefore, both the total sum of squares
and the regression one can be expressed in terms of
distance between empirical values and their
averages.
Under these circumstances, the greater the
regression sum of squares the better the model fits
the data.
When there are more addends of the total sum of
squares than those just mentioned, an increase in the
regression sum of square does not necessarily imply
a better fit to observed data: this is because the
theoretical average value, from which the regression
sum of squares is calculated, may be very different
from the empirical one. On the contrary a decrease
in the residual sum of squares necessarily implies a
better fit to observed data.
In order to assess the goodness of fit of the
regression model, we propose the following index,
for simplicity called Fuzzy Fit Index (FFI), which is
common to all three models:
==
2
i
2*
ii
)Y,Y
~
(d
)Y
~
,Y
~
(d
1
SSTot
SSsRe
1FFI
where
T
*
R
*
L
**
)y,y,y(Y = and
TRL
)y,y,y(Y =
denote the
fuzzy theoretical average and the fuzzy empirical
average of the dependent variable respectively.
The more this index is next to 1, the smaller the
residual sum of squares is and the better the model
fits the observed data.
With specific reference to the model with a
symmetric (both fuzzy and not) intercept, if the
residual sum of squares decreases, also the distance
between theoretical and empirical fuzzy averages of
the dependent variable decreases, as well as the
component η of the total sum of squares. It follows
ultimately that the forecasting capability of the
model increases.
6 A STEPWISE FORWARD
PROCEDURE TO SELECT
INDEPENDENT VARIABLES
The selection of the most significant independent
variables presents greater difficulties from a
computational point of view in the case of a fuzzy
regression model than in the classic one.
In classical regression analysis, if the number p
of independent variables is limited, the optimal
subset of them can be selected by examining in
succession at most
)!kp(!k
!p
models, from the
simple ones (k = 1) to the saturated one (k = p).
The fuzzy approach makes the search for optimal
combinations of explanatory variables more
complex from a computational point of view.
The total number of the potential hyperplanes to
be tested increases exponentially with the number p
of the starting variables considered: in fact, for each
subset of qp variables, 2
q
different hyperplanes
result from all combinations of the signs assumed by
the corresponding regression coefficients.
In order to avoid complications related to the
above checks, we introduce a stepwise procedure
which enables us to find the optimal combination of
the starting variables by including only one of them
at a time. At each iteration the procedure selects the
variable which helps to explain the total sum of
squares of the dependent variable more than the
other variables not yet included in the model and
which is also less correlated with the ones already
included. This allows us to estimate
=
1p
0k
k
2)kp(2
model at most.
More specifically, in the first step
X
~
(1)
is
included in the equation if it presents the highest
correlation with the dependent variable
Y
~
; in the
q.th step
X
~
(q)
is selected to enter the model if its
explanatory contribution to the sum of squares of
Y
~
is higher than the other variables not yet included
and also than an arbitrary threshold value. Such a
contribution can be measured as the increase in the
FFI due to the introduction of
X
~
(q)
into the equation,
FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications
424
equal to FFI
y;1,2,...,q
- FFI
y;1,2,...,q-1
(where the two terms
of the subtraction represent the proportion of the
sum of squares of
Y
~
explained by the model
including
X
~
(q)
and not). The higher the threshold
value, the easier the procedure inhibits the entry of
new independent variables, because of the increases
in the fraction of the total variability which should
be explained.
Once
X
~
(q)
is selected, its originality is evaluated
through the so called tolerance T
q
=1-FFI
q;1,2,...,q-1
,
where FFI
q;1,2,...,q-1
represents the share of variability
of
X
~
(q)
explained by the q-1 independent variables
already in the model. The tolerance ranges between
0 and 1, depending on the degree of linear
correlation of
X
~
(q)
with the other variables;
therefore, only if T
q
exceeds a threshold between 0
and 1,
X
~
(q)
will become part of the model. A high
value of the threshold allows to select very original
variables, but it can also stop the process right from
the initial steps; on the contrary, a low value allows
most of the variables enter into the equation only if
they explain a significant fraction of variability of
Y
~
. The described procedure stops when none of the
variables not yet included in the equation may
introduce a significant contribution to the model, or
if none of the candidate variables to enter is
significantly original.
For an application of this procedure see
Montrone, Campobasso, Perchinunno and Fanizzi,
2011, which elaborates on data revealed by the EU-
SILC survey of 2006 regarding the perception of
poverty by Italian families. For this purpose, by
using the editor of Matlab, we generated a function
which requires, as input parameters, the matrices of
cores, left extremes and right extremes both of the
dependent and of the independent fuzzy variables.
A more accurate procedure provides the
possibility of eliminating at each iteration variables
already included in the model, whose explanatory
contribution is subrogated by the combination of the
independent variables introduced later.
In particular, unlike the procedure just described,
we can verify at each iteration that the explanatory
contribution of the variable
X
~
(i)
(i = 1, 2, ..q-1) is
still significant, once the candidate variable
X
~
(q)
is
inserted. In the q.th step such a contribution can be
measured by the reduction of FFI in the elimination
of the variable
X
~
(i)
from the model, equal to
FFI
y;1,2,...,q
- FFI
y;1,2,...,q (-i)
(where the two terms of the
subtraction represent the proportion of the sum of
squares of
Y
~
explained by the model including all
the variable
and without the variable
X
~
(i),
respectively). So, the variable
X
~
(i)
remains in the
model if the percentage of the sum of squares
explained by the model including all variables is
higher than the model without the variable
X
~
(i)
and
also arbitrary threshold value.
7 CONCLUSIONS
In this work we first explicit the expressions of the
estimated parameters of a multivariate fuzzy
regression model with a fuzzy asymmetric intercept.
Such an intercept is more appropriate than a non-
fuzzy on, as it is to be estimated by the average
value of the dependent variable (which is also
fuzzy) when the independent variables equal zero.
Moreover we verify that the sum of squares of
the dependent variable consists simply in the
regression sum of squares and the residual one, like
it happens in the classic OLS estimation procedure,
only when the intercept is fuzzy asymmetric
triangular. Conversely, when the intercept is
symmetric (both fuzzy and not), the analysis of the
forecasting capability of the model is more difficult.
This happens because of the presence of two
additional components of the sum of squares: the
first one which is related to the difference between
the theoretical and the empirical average values of
the dependent variable, the second one which is
residual in nature and is characterized by an
uncertain sign.
The selection of the most significant independent
variables in a fuzzy regression model presents
computational difficulties due to the large number of
potential hyperplanes to be tested. We propose to
overcome such difficulties through a stepwise
procedure, based on a fuzzy version of the R
2
index.
In each step a single variable is included between
the starting ones,
according to two basic criteria: its
explanatory contribution to the model and its
originality with respect to the other variables already
included
in the model.
A more accurate procedure provides the
possibility of eliminating at each iteration variables
already included in the model, whose explanatory
contribution is subrogated by the combination of the
independent variables introduced later.
The forecasting capability of the proposed fuzzy
regression model has been successfully verified in a
recent application to data revealed by the EU-SILC
survey of 2006, regarding the perception of poverty
by Italian families. In that circumstance we have
used the editor of Matlab and, in particular, we have
A STEPWISE PROCEDURE TO SELECT VARIABLES IN A FUZZY LEAST SQUARE REGRESSION MODEL
425
generated a function which requires, as input
parameters, the matrices of cores, left extremes and
right extremes both of the dependent and of the
independent fuzzy variables.
Some improvements to the model mainly
concern the shape of the membership function
different from the triangular one.
REFERENCES
Bilancia, M., Campobasso, F., Fanizzi, A., 2010. The
pricing of risky securities in a Fuzzy Least Square
Regression model. In
Advances in Data Analysis and
Classification 2010.
Springer Berlin-Heidelberg-New
York,.
Campobasso, F., Fanizzi, A., Tarantini, M., 2009. Some
results on a multivariate generalization of the Fuzzy
Least Square Regression. In
Proceedings of the
International Conference on Fuzzy Computation,
Madeira.
Campobasso, F., Fanizzi, A., 2011. A Fuzzy Approach To
The Least Squares Regression Model With A
Symmetric Fuzzy Intercept. In
Proceedings of the 14th
Applied Stochastic Model and Data Analysis
Coinference, Roma.
Campobasso, F., Perchinunno, P., Fanizzi, A., 2008.
Homogenous Urban Poverty Clusters within the city
of Bari. In
Lecture Notes in Computer Science ICCSA
2008
. Springer.
Diamond, P. M., 1988. Fuzzy Least Square. In
Information Sciences.
Kao, C., Chyu, C. L., 2003. Least-squares estimates in
fuzzy regression analysis. In
European Journal of
Operational Research
.
Montrone, S., Campobasso, F., Perchinunno, P., Fanizzi,
A., 2011. A Fuzzy Approach to the Small Area
Estimation of Poverty in Italy. In
Advances in
Intelligent Decision Technologies – Proceedings of the
Second KES International Symposium IDT 2010,
Springer.
Montrone, S., Campobasso, F., Perchinunno, P., Fanizzi,
A., 2011. An Analysis of Poverty in Italy through a
fuzzy regression model. In
Lecture Notes in Computer
Science ICCSA 2011
, Springer.
Montrone, S., Perchinunno, P., Di giuro, A., Torre, C. M.,
Rotondo, F., 2011. Identification of hot spot of social
and housing difficulty in urban areas. In
Lecture Notes
in Computer Science ICCSA 2011
, Springer.
Takemura, K., 2005. Fuzzy least squares regression
analysis for social judgment study. In
Journal of
Advanced Intelligent Computing and Intelligent
Informatics.
FCTA 2011 - International Conference on Fuzzy Computation Theory and Applications
426