data:image/s3,"s3://crabby-images/3d220/3d22025833b3e440af47a07f2fa71132758ef706" alt=""
have a base with x-coordinates 875-1125.
Clearly, this method of fuzzification should
only be used when no UoD is known, as the
fuzzification would be meaningless if the
domain is extremely wide (for example 0-
1.000.000) and exaggerated if the domain is
very narrow (for example 900-1100).
• Value-based fuzzification: This fuzzification
results in a function with a base width,
depending in the actual value. The greater the
value, the wider the base. An example: the value
is 100 and the fuzzification is 0.1. The resulting
function will have a base with x-coordinates 90-
110 (width=20). The same fuzzification on a
value of 1000 will result in a base 900-1100
(width=200). This fuzzification method is
therefore only applicable in certain contexts,
where larger values require less precise
matches.
Besides the fuzzification method, a fuzzifying
function needs to be defined. Depending on the
application context, this might for instance be a
triangular, trapezoid or Gauss function.
3.4 Aggregation
The second step is to aggregate the fuzzified values
in order to determine the degree to which these
values correspond. Three standard methods have
been implemented in the engine and, as was the case
for the fuzzification operations, additional
aggregation methods such as product or union can
be easily implemented by higher level applications.
The implemented methods are the following.
• Intersection: The intersection operator models
the fuzzy ‘AND’, and aggregates two fuzzy sets
using function intersection. Intersection is a
very strict yet commonly used form of
aggregation. Using fuzzy intersection, two
properties will only match well if they both
contain high membership values.
• Absolute difference: The absolute difference
aggregates two fuzzy sets into a function
representing the absolute difference of both.
The absolute difference between piecewise
linear functions is a new piecewise linear
function, and the absolute difference between a
point and a piecewise linear function is a new
point function. The difference aggregation does
not take into account the actual values of the
points, but only compares the amount in which
the both values differ. As a result, two very low
values might match much better than a low and
a high value. In certain contexts this might not
be the expected behavior and in these cases a
different aggregator should be used.
• Bounded difference: The bounded difference
determines the fuzzy difference between two
functions f
1
and f
2
, with a lower bound of 0. In
other words, the difference is max(f
1
-f
2
,0). In
contrast to the other aggregators, the order of
the functions is important here. Indeed, the
bounded difference of f
1
and f
2
is not
necessarily equal to the bounded difference of f
2
and f
1
. As with the absolute distance, this
aggregator is not suited for every form of
matching as a set of low preferences might
result in a perfect or near-perfect matching
score.
3.5 Defuzzification
The aggregation step is followed by a final step of
defuzzification a distance or matching value. This
resulting value is a measure for the similarity of two
property values. Depending on the data type of one
or both of the properties, either a numerical value or
range (partial matching) will be returned. As before,
additional operators can be easily added at the
Application layer, but the following operators are
available by default.
• Max: Simply returns the maximum membership
value of a fuzzy value. This can be used to
determine the maximum intersection value of
two properties and will be used most often in
fuzzy matching. However, if at least one of the
properties is a discrete set and the property
should only receive a high score if all of the
options in the set match well, average
intersection or a matching based on difference-
aggregation should be used. The max
defuzzification used in combination with a
bounded or absolute difference aggregation only
compares the similarity of property values,
without taking into account the actual values
themselves. This means, two properties with
both low, nearly equal values will score match
very closely. In some cases, this is not expected
behavior. In those cases distance function based
on intersection can be used.
• Average: Returns the average function value.
This property distance can be used when at least
one of the properties is a discrete set and the
property should only receive a high score if all
of the options in the set match well. If the
property score should reflect the score of the
best matching option, Max-Intersection should
be used instead.
DESIGN AND IMPLEMENTATION OF A SCALABLE FUZZY CASE-BASED MATCHING ENGINE
379