observable objects is determined by a logical
condition over cause-attributes.
Similarly the blocks 11..13 is a special case of
blocks 14&17, where the user examines what
reasons lead to specified effect. The logical
condition of effect-attributes determines the set of
observable objects.
Again the variants in blocks 8..10 and in blocks
11..13 differ solely in the interpretation.
Basically the results findable by blocks 14..17
can be obtained by proper repeated application of
simpler variants in blocks 8..13, but it is more
practical to give that work to the computer. For the
human user giving the different value combinations
(as logical expression) one by one is arduous
enough.
Usually it is reasonable to require from the user
that the sets of causes and effects do not intersect. In
cases (of variants) 15 and 17 the overlapping
attributes are always present in the fixed-length part
(C in block 15, E in block 17) and they can also
appear in the other part of relations. In case of
variant (in block) 16 such attributes can fall into
both sides. But something that causes itself or results
from itself is not very informative.
The overlapping might make sense if more than
one value is allowed for the overlapping attribute(s)
and objects with different values of such attribute(s)
form the same cause or effect. This is possible when
causes or effects are given by a logical expression
(blocks 8 and 11 accordingly). Appearing in the
other part of relations the overlapping attributes may
provide interesting information.
The same is true for restricting the context: if
more values are allowed for the attribute(s)
determining a context then it makes sense to observe
this(these) attribute(s) in the relations.
Generator of hypotheses does not presuppose
that observable objects are classified, however it
may come in handy when solving that task.
(Automatic) classification occurs here as follows.
The user submits a list of attributes (either causes or
effects); the system finds existing value
combinations of given attributes and each such
combination describes a class of objects. Such
classification takes place in block 15 by cause-
attributes and in block 17 by effect-attributes. As
mentioned, in these cases the difference (that is so
important for the user) is only in the interpretation.
In blocks 8..13 the determination of interesting
class by the researcher takes place on the basis of a
logical condition either by causes (block 8) or by
effects (block 11).
The variants on the left side of the scheme
(blocks 3..6) where the attributes are not divided into
causes and effects by the user is realized by
Generator of Hypotheses (Kuusik and Lind, 2004).
Variants on the right side are covered by machine
learning methods. Generally the classes are given
and rules for determining them have to be found
(Roosmann et al, 2008, Kuusik et al, 2009). Usually
the ML methods assume that class is shown by one
certain attribute, but in essence it can be a
combination of several attributes shown by a logical
expression. Again, whether the given classes are
cause (blocks 8..10, 14..15) or effect (blocks 11..13,
14&17), depends on the interpretation. Determinacy
Analysis (DA) can be qualified as a subtask of
machine learning as it finds rules for one class at a
time. So it covers the variants in blocks 8..10 and
11..13. Given class can be cause (in block 8) or
effect (in block 11). Output containing combinations
by M attributes (as in blocks 9 and 13) can be found
using DA-system (DA-system, 1998), output
according to blocks 10 and 12 can be obtained using
step-wise DA methods which allow rules with
different length (Lind and Kuusik, 2008; Kuusik and
Lind, 2010). By repeated use of DA also the variants
given in blocks 14..17 can be performed.
4 CONCLUSIONS
We have presented in the paper an idea for Universal
Generator of Hypotheses. We have discussed that
matter with specialists of data analysis and they have
mentioned that the use of DA and GH is not enough,
there are several other tasks to solve and there is
need for developing some additional new
possibilities. All these possibilities are described in
the paper. Possibilities of DA and GH are also
described in the paper and they are the part of the
functionality of UGH. As we have mentioned, it is
possible to realize UGH, there exist the base
algorithm and special pruning techniques on the
basis of which the functionality of UGH is easily
realizable.
REFERENCES
Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J.,
1984. Classification and Regression Trees, Belmont,
California: Wadsworth.
Clark, P., Niblett, T., 1987. Induction in Noisy Domains.
In Progress in Machine Learning: Proceedings of
EWSL 87 (pp. 11-30). Bled, Yugoslavia, Wilmslow:
Sigma Press.
AnIdeaforUniversalGeneratorofHypotheses
173