the most popular and easiest to obtain variables. In
some other cases the decision is based upon the
availability of data, amount of missing observations,
etc. The less legitimate approach is: to try several
different variables, and then select the ones
generating results that best facilitate the conclusions
these modelers want to reach. Of course, there is
always a possibility of criticism: why a given
selection among the data series was made, and not
another. The method introduced in this study
precludes such criticism, since all the data series are
utilized.
1.2 Advantages of Utilizing Intervals
In this study we introduce a method of converting
numerical vectors into ranges (intervals) of values
that are derived from all the available data series.
There are several important advantages of
transforming available data into intervals of values:
a. The very basic principle in the field of
Information Systems is: all available data are
valuable (unless suspected of being severely
distorted) and should be utilized in the modeling
process.
b. Confidence in the modelling results: when the
approach is inclusive and involves all the
available data series, then obviously the
confidence in results is greater vs. modelling
process involving selected data series while
ignoring others.
c. Efficient handling of missing observations: This
issue arises when in many data series there is a
large number of missing measurements. For
example, in our case study, we utilized
economic data from over 125 countries (for
variables: aggregate economic activity per
capita and exports per capita), but in many data
series (numerical vectors), we encountered a
problem of missing data for dozens of countries.
In addition, the set of missing countries was not
the same in different data series. However, the
problem of missing data was resolved by
constructing intervals for every country, for
which there was at least one measurement. Of
course, in some intervals there were more data
points and in others less, but we included all
these countries in the modeling process, and
thus increased our confidence in the results.
d. It is much easier to reach meaningful and
unambiguous conclusion due to the drastic
reduction of the amount of regression runs. For
example, if our dependent variable is “aggregate
economic activity per capita” (17 data series),
and our explanatory variable is “exports per
capita” (12 data series), then when trying all
possible combinations of these variables, we
will have to perform over 200 regression runs.
The problem here is not only the amount of
work, but also the question of how to
summarize so many results and to reach
meaningful conclusion? However, when using
the method presented here, the amount of
regression runs drops to 4:
1. Regression using only Minimum values
2. Regression using only Maximum values
3. Regression of Minimum for dependent variable
vs. Maximum of explanatory variables
4. Regression of Maximum for dependent variable
vs. Minimum of explanatory variables
Note: It does not matter how many explanatory
variables are expressed in terms of intervals, the
method will still require only four regression runs.
The four regression runs generate four results,
which again can be reduced to an interval
between the minimum and the maximum value
of the results, and this interval can be used to
draw conclusions as well as for further
computations.
1.3 Literature Survey
The idea of utilizing intervals in fuzzy information
processing is not new. Schneider and Kandel (1993)
introduced the idea of utilizing Fuzzy Expected
Intervals (FEI) in order to handle higher degrees of
uncertainty in Fuzzy Expert Systems. Wagman et al.
(1994) proposed to generate intervals of real
numbers to be processed by the fuzzy matching
algorithm.
Nguyen and Kreinovich (1996) address the issue
of estimating intervals within the domain of physical
measurements during the manufacturing process. If
here is a variable , which cannot be measured
directly, (or very difficult to measure directly) then it
is estimated indirectly (the procedure is called
“indirect measurement”), using a related variable .
Due to imprecision of measurements, numerical
values of are measured in terms of intervals, and
the authors address the issue of estimating the
corresponding intervals of the computed variable .
Hans and Gottwald (1995), define theoretically
various implementations of fuzzy intervals. The
authors present ways to define fuzzy interval, such
as (a) defining a crisp interval to form the kernel,
from which the membership function decreases to
zero, or (b) by two fuzzy numbers representing the
edges of interval. The authors also describe
KDIR 2019 - 11th International Conference on Knowledge Discovery and Information Retrieval
248