DESIGN AND IMPLEMENTATION OF A SCALABLE
FUZZY CASE-BASED MATCHING ENGINE
RAMI HANSENNE, JONAS VAN POUCKE, VEERLE VAN DER SLUYS
Actonomy NV, Stapelplein 70, B – 9000 GENT (Belgium)
BARTEL VAN DE WALLE
Department of Information Systems and Management, Tilburg University
K
eywords: Case-based reasoning; fuzzy set theory; matching engine; matching techniques.
Abstract: We discuss the design and the implementation of a flexible and scalable fuzzy case-based matching engine.
The engine’s flexible design is illustrated for two of its core components: the internal representation of cases
by means of a variety of crisp and fuzzy data types, and the fuzzy operations to execute the ensuing case
matching process. We investigate the scalability of the matching engine by a series of benchmark tests of
increasing complexity, and find that the matching engine can manage an increasingly heavy load. This
indicates that the engine can be used for demanding matching processes. We conclude by pointing at several
applications in experimental electronic markets for which the matching engine currently is being put to use,
and indicate avenues for future research.
1 INTRODUCTION
Case-based reasoning (CBR) is a problem solving
approach resembling an example-based search
process. Problems that have been encountered earlier
are stored as examples, and when confronted with a
new problem, similar problems from this set are
identified by means of a search process. The query
(or target) problem is then classified according to the
similarity of earlier examples that have been
identified (Kolodner, 1993). More formally, CBR
can be defined as a four-step process (Aamodt and
Plaza, 1994):
Retrieve: Given a target problem, relevant
stored cases are retrieved. A case consists of a
problem, its solution, and, typically, annotations
about how the solution was derived.
Re-use: Map the solution from the previous case
to the target problem. This may involve
adapting the solution as needed to fit the new
situation.
Revise: Having mapped the previous solution to
the target situation, test the new solution in the
real world (or a simulation) and, if necessary,
revise.
Retain: After the solution has been successfully
adapted to the target problem, store the resulting
experience as a new case in memory.
The CBR process in general and the retrieval phase
in particular are the focus of the matching engine we
describe in this paper. The general design principles
that have guided the development of the matching
engine are presented in the following section, as well
as the case matching pipeline we have constructed.
Section 3 focuses on the internal workings of the
matching process, for which various operators
derived from fuzzy set theory are used. Section 4
investigates the scalability of the fuzzy matching
engine, by varying the case complexity and the
number of cases. We conclude in Section 5 by
presenting relevant enterprise matching applications
and indicating future research opportunities and
challenges.
375
HANSENNE R., VAN POUCKE J., VAN DER SLUYS V. and VAN DE WALLE B. (2004).
DESIGN AND IMPLEMENTATION OF A SCALABLE FUZZY CASE-BASED MATCHING ENGINE.
In Proceedings of the Sixth International Conference on Enterprise Information Systems, pages 375-382
DOI: 10.5220/0002645903750382
Copyright
c
SciTePress
2 DESIGN OF THE MATCHING
ENGINE
While our focus in this paper is on case-based
reasoning problem solving, it must be noted that it
was our overall design goal to develop a generic
framework for reasoning about data, and to allow
extensions of our framework towards other AI
technologies, such as clustering, re-inforcement
learning, neural networks and expert systems (Pal et
al., 2001).
2.1 Design layers overview
2.1.1 Algorithm Layer
This layer is concerned with the actual
implementation of the matching algorithms.
Moreover, Layer 1 controls the scalability of the
engine, e.g. by running certain matching algorithms
in parallel. To the end user, the Algorithm Layer is
the most abstract layer: there is no graphical user
interface (GUI) to directly interact with this layer.
The functionality of this layer is, in short, restricted
to matching one data set to another. This layer does
not ‘know’ where the data sets come from, or where
they will be further processed. This is taken care of
by the second layer, the Management Layer.
2.1.2 Management Layer
The Management Layer is an intermediate layer
between the Algorithm Layer and the Application
Layer and makes abstraction of the communication
between these two layers. This layer provides
management functionality to the matching engine,
for example security, data import from files, data
bases or data stores, user management, etc. This
layer must enable those users who have no specific
technical knowledge about the matching algorithms
or the structure of the datasets to work with the
matching engine.
2.1.3 Application Layer
The Application Layer, is the front end of the
software engine application. Layer 3 contains the
software which actually makes use of the matching
engine, and can add its own functionality to the
application. Dedicated GUIs can be developed and
other applications can be integrated within this layer.
Other applications such as search engines, web
services or autonomous agents can make use of the
data that have been matched.
2.2 The matching engine pipeline
The flow of the fuzzy matching engine as shown in
Figure 2 below can in essence be viewed as a
traditional input - output process, where the input
consists of one or more cases (the actual query), the
process is the fuzzy matching process, and the
output consists of the cases that have been matched
(the actions). This process is iterative: the output of
the process can be fed back into the engine, and re-
used for a subsequent or new matching query.
The different steps in the ‘pipeline’ of Figure 2 are
elaborated below. In principle, every step should be
individually configurable (at startup time and run
time), and controlled by a “step manager”. We are
currently exploring the development of a separate
workflow engine which controls each step of the
process and also indicates which loop-backs should
be performed.
2.2.1 Cases
Cases are the start of the process and are typically
created in Layer 2, the Management Layer. A case is
defined as a set of different properties, with every
property describing a single attribute on which a
matching is requested. Any property can have one of
the following formats: Boolean, Ordinal
(qualitative), Numeric (quantitative), Alpha numeric,
Fuzzy (vague) or Unknown. These properties are
extensible, as we do not know beforehand which
other data types may be needed. We only know that
they are atomic and that they form the basis upon
which will be matched. Properties also have meta-
information. They can have a weight to describe the
importance of an individual property, a ranking
relative to other properties and a veto power on other
properties that are not compatible. Some properties
can be converted into another format. For example, a
Numeric property is a special case of a Fuzzy
property, and hence could be converted into a Fuzzy
one without loss of precision or information. Cases
are stored in memory, which can be persistent or
transient.
Figure 1: The matching engine pipeline
ICEIS 2004 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
376
2.2.2 Pre-processing
The pre-processing step occurs before the actual
matching. While this step is not part of the actual
matching algorithm, it is, globally speaking, part of
the matching process. This step is not intended to
create cases or do property extraction from some
source, but rather to do some manipulation on the
properties of a case. The pre-processing is a queue
of zero, one or more pre-processing steps, but each
step is independent of the other. As such, they run
sequentially in a well defined order. The pre-
processing is not required; actually, a valid
implementation is one that simply let pass all cases
without modifying them.
2.2.3 Matching
The matching step is the heart of the workflow. Here
we select the algorithm to match cases against
others. The actual implementation can differ, and
one could choose for a simple sorting of cases or
instead perform complex clustering algorithms. For
our purposes, there are some requirements upon the
matching step. We must be able to handle unknown
and incomplete data, or more specifically, we must
be able to handle cases where some or more
properties are absent, unknown or invalid. As a
result of the matching step, we obtain a result or
conclusion. This result can be partially complete, but
it always has received a score which indicates how
well the case matches with other cases.
2.2.4 Post-processing
The post-processing step acts on the result of the
matching step and is optional, as was the pre-
processing step. In this step, for instance, we could
decide to store intermediate results.
2.2.5 Selection of actions and matching
loops
In principle, the workflow process could run in an
endless loop: the result of the matching step can be
fed again into the initial ‘cases’ step. However, an
action is associated with the conclusion of every
single loop. This allows us to act upon partial results
that have been obtained during the matching or
reasoning. The main reason for having a loop is to
make the matching process more powerful. The
algorithm of the matching can be selected before we
perform the loop. In a first loop, we could match the
cases with one particular algorithm and perform a
second loop with another algorithm. This can be
repeated until we are satisfied with the result or with
a pre-defined number of times. The result is a
matching flow which consists of several smaller
matching loops, running in parallel or sequential.
3 FUZZY MATCHING PROCESS
3.1 Introduction
The fuzzy case-based matching engine is capable of
comparing the properties of a set of cases and
produces a matching result indicating the degree to
which every two cases match. As mentioned earlier,
a case is any uniquely identifiable entity (a product
description, a buyer preference, a CV,…) containing
property values for certain criteria. For example,
criteria may be “color” and “price” and their
corresponding property values might be “red” and
“100$”, respectively.
In the remainder of this section, we describe the
internal workings of the engine in more detail.
Figure 2: The fuzzy matching engine’s internal
functionality
DESIGN AND IMPLEMENTATION OF A SCALABLE FUZZY CASE-BASED MATCHING ENGINE
377
3.2 Input and internal data type
representation
Internally, the fuzzy engine performs its operations
using fuzzy data types. The property values of the
input cases may be defined in both a crisp (e.g. a
number or a range) or fuzzy (based on a membership
function) data type. Table 1 summarizes the
different data types that can be used in the fuzzy
engine.
Table 1: Matching engine data types
Numerical A (crisp) number which can be
represented with double
precision
Discrete set A finite list of options. An option
can be any uniquely identifiable
object. An example of a discrete
set would be {“red”, “blue”,
“orange”}.
Weighted
Discrete Set
Identical to a standard discrete
set, however every set member
has an associated weight or
membership value (in the
interval [0,1]). This allows an
application to use fuzzy
modifiers for each et member,
each mapping to a certain
weight. For instance: 25%=“a
little”;50%=”somewhat”;
75%=”quite”,…
Range A single continuous numerical
range.
Range Set A unique set of ranges
Weighted
Range Set
A range set, with weights
associated to every range. This is
a generalization of a weighted
discrete set.
Fuzzy Set A fuzzy value, represented by a
function.
Case Properties may be compound.
For example a property “price”
may be a composition (linear
function) of “base price”, “VAT”
and “s&h”. These compound
properties are modeled as sub-
cases containing properties for
the sub-criteria.
Before comparing property values, the engine
converts any non-fuzzy property value into a fuzzy
value at the pre-processing phase. The matching
engine works with both point-functions and piece-
wise linear functions as membership functions for
fuzzy values. These can represent all the most
frequently used fuzzy set membership functions
(triangular, trapezoid…) and allow an approximation
of others, such as Gauss curves (De Baets et al.,
1989; Klir and Bo, 1995).
In order to match the properties of two distinct
cases, three standard operations need to be
performed: the fuzzification of the property values,
the aggregation of these fuzzy values, and the
defuzzification of the aggregation into a crisp
matching value, respectively (Xu et al., 2001). We
discuss each of these steps in the following sections
in some more detail.
3.3 Fuzzification
In order to convert crisp input values into fuzzy
values, a method of fuzzification needs to be
selected. A wide variety of fuzzification methods
exists, and the most suitable method for the case at
hand will depend on the usage context. Three
standard methods have been implemented in the
engine. In addition, specific fuzzification operations
can be performed in higher level applications,
simply by creating the appropriate function and
passing the result as a fuzzy property value to the
engine. The following fuzzification methods have
been implemented.
Range-based fuzzification: This method
fuzzifies the value over a domain (UoD or
Universe of Discourse). The fuzzification factor
(a number in the interval [0,1]) indicates over
which percentage of the domain the value will
be fuzzified. The UoD width indicates how
wide the function domain is. The greater the
range, the wider the resulting fuzzy function
will be. In other words, this method assumes
that properties with a broader value domain
(e.g. “price” in interval [0-10.000]) need not be
matched as precise as properties with a narrow
domain (e.g. “age” in interval [25-65]). An
example: a numerical value is 1000 and the
domain is 0-2000. A triangular fuzzification
with fuzzification factor 0.1 will result in a
function with base x-coordinates 900-1100. The
same fuzzification over a domain 900-1100 will
result in a function with base 990-1010. This
ensures that the fuzzification is meaningful in
the specific UoD context.
Fixed fuzzification: If no domain is available
however, the method for fixed fuzzification or
value-oriented fuzzification (presented next) can
be applied. The fixed fuzzification results in a
function with a specified base width, which
does not depend on the actual value. An
example: a numerical value is 1000 and the
fuzzification is 250. The resulting function will
ICEIS 2004 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
378
have a base with x-coordinates 875-1125.
Clearly, this method of fuzzification should
only be used when no UoD is known, as the
fuzzification would be meaningless if the
domain is extremely wide (for example 0-
1.000.000) and exaggerated if the domain is
very narrow (for example 900-1100).
Value-based fuzzification: This fuzzification
results in a function with a base width,
depending in the actual value. The greater the
value, the wider the base. An example: the value
is 100 and the fuzzification is 0.1. The resulting
function will have a base with x-coordinates 90-
110 (width=20). The same fuzzification on a
value of 1000 will result in a base 900-1100
(width=200). This fuzzification method is
therefore only applicable in certain contexts,
where larger values require less precise
matches.
Besides the fuzzification method, a fuzzifying
function needs to be defined. Depending on the
application context, this might for instance be a
triangular, trapezoid or Gauss function.
3.4 Aggregation
The second step is to aggregate the fuzzified values
in order to determine the degree to which these
values correspond. Three standard methods have
been implemented in the engine and, as was the case
for the fuzzification operations, additional
aggregation methods such as product or union can
be easily implemented by higher level applications.
The implemented methods are the following.
Intersection: The intersection operator models
the fuzzy ‘AND’, and aggregates two fuzzy sets
using function intersection. Intersection is a
very strict yet commonly used form of
aggregation. Using fuzzy intersection, two
properties will only match well if they both
contain high membership values.
Absolute difference: The absolute difference
aggregates two fuzzy sets into a function
representing the absolute difference of both.
The absolute difference between piecewise
linear functions is a new piecewise linear
function, and the absolute difference between a
point and a piecewise linear function is a new
point function. The difference aggregation does
not take into account the actual values of the
points, but only compares the amount in which
the both values differ. As a result, two very low
values might match much better than a low and
a high value. In certain contexts this might not
be the expected behavior and in these cases a
different aggregator should be used.
Bounded difference: The bounded difference
determines the fuzzy difference between two
functions f
1
and f
2
, with a lower bound of 0. In
other words, the difference is max(f
1
-f
2
,0). In
contrast to the other aggregators, the order of
the functions is important here. Indeed, the
bounded difference of f
1
and f
2
is not
necessarily equal to the bounded difference of f
2
and f
1
. As with the absolute distance, this
aggregator is not suited for every form of
matching as a set of low preferences might
result in a perfect or near-perfect matching
score.
3.5 Defuzzification
The aggregation step is followed by a final step of
defuzzification a distance or matching value. This
resulting value is a measure for the similarity of two
property values. Depending on the data type of one
or both of the properties, either a numerical value or
range (partial matching) will be returned. As before,
additional operators can be easily added at the
Application layer, but the following operators are
available by default.
Max: Simply returns the maximum membership
value of a fuzzy value. This can be used to
determine the maximum intersection value of
two properties and will be used most often in
fuzzy matching. However, if at least one of the
properties is a discrete set and the property
should only receive a high score if all of the
options in the set match well, average
intersection or a matching based on difference-
aggregation should be used. The max
defuzzification used in combination with a
bounded or absolute difference aggregation only
compares the similarity of property values,
without taking into account the actual values
themselves. This means, two properties with
both low, nearly equal values will score match
very closely. In some cases, this is not expected
behavior. In those cases distance function based
on intersection can be used.
Average: Returns the average function value.
This property distance can be used when at least
one of the properties is a discrete set and the
property should only receive a high score if all
of the options in the set match well. If the
property score should reflect the score of the
best matching option, Max-Intersection should
be used instead.
DESIGN AND IMPLEMENTATION OF A SCALABLE FUZZY CASE-BASED MATCHING ENGINE
379
0
20
40
60
80
100
120
140
160
180
200
1 10 100 1000
Crisp matching 1st run
Crisp matching 2nd run
Fuzzy matching 1st run
Fuzzy matching 2nd run
0
50
100
150
200
250
300
350
400
110100
Crisp matching 1st run
Crisp matching 2nd run
Fuzzy matching 1st run
Fuzzy matching 2nd run
.))0),()(max(1(
1
21
xSxU
m
ii
C
j
C
.
|))()(|1(
1
,1
1
21
,1
1
2,1
=
=
=
Ni
C
C
j
C
Ni
C
j
i
iii
W
xSxU
m
W
D
3.6 Matching
Once all property distances are computed for each of
the matching criteria, these distances can be turned
into a case distance. Several case distance functions
are available, however the weighted sum will be
used most frequently in fuzzy matching (Zadeh
1971).
Combining an intersection aggregator and a Max
defuzzification, will result in a matching value
defined as
with Ui and Si property values for a criterion Ci and
Wi a weight for the criterion Ci. Combining an
absolute difference aggregator and an Avg
defuzzification, will result in
The bounded difference and Avg defuzzification
amount to (unweighted):
4 FUZZY MATCHING ENGINE
SCALABILITY TESTING
This section provides a brief overview of the
performance of the fuzzy case-based matching
engine.
The benchmarks were performed under the
following test conditions: Hardware: P4-2,66Ghz,
512Mb Ram; Software: Windows XP, JDK1.4.1
Configuration: Single threaded; Fuzzy config: Fixed
value fuzzification (other fuzzification types are
marginally slower). All times are represented in
milliseconds (ms). Two runs are performed per
evaluation, to ensure initialization and configuration
of the matching engine are not taken into account.
4.1 Case scaling
This test benchmarks matching speed for cases with
a single property, in order to evaluate the scaling in
function of the amount of cases. 1000 cases are
evaluated in approximately 140 ms when fuzzy logic
is used.
Using a plain CBR algorithm, 1000 cases are
evaluated in approximately 110 ms. The chart also
illustrates that the engine scales in a logarithmic and
not a linear fashion. This means the engine works
optimally when processing a large amount of cases.
4.2 Property scaling
This test benchmarks matching speed for a single
case, with an increasing amount of properties.
Here we note that the fuzzy algorithm
implementation is faster than standard CBR when
,
)))(),(((
,1
1
21
,1
1
2,1
=
=
=
Ni
C
C
j
C
Ni
C
j
i
iii
W
xSxUMinMaxW
D
Figure 3: Results of the case scaling tests for the fuzzy
matching engine
Figure 4: Results of the property scaling test for the
fuzzy matching engine
ICEIS 2004 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
380
0
100
200
300
400
500
600
700
10 100 1000
Crisp matching 1st run
Crisp matching 2nd run
Fuzzy matching 1st run
Fuzzy matching 2nd run
0
100
200
300
400
500
600
700
800
900
10 100 1000
Fuzzy matching 1st run
Fuzzy matching 2nd run
processing a limit amount of cases with a large
amount of properties.
4.3 Real-world scaling
This benchmark tests matching speed for an
increasing amount of cases, with 10 properties each.
Most cases in real world applications can be
represented with no more than 10 properties, so that
this test gives a good idea of real world
performance. 1000 cases can be ‘fuzzy’ matched
against each other in approx. 400 ms (180 ms for
standard CBR). This means the matching engine is
capable of performing 25,000 fuzzy matches per
second, which is faster than most databases can
produce the data required for the matching.
4.4 Fuzzy scaling
All previous benchmarks were performed on pure
numerical properties. This benchmark tests the fuzzy
matching speed for an increasing amount of cases.
Each case contains 7 properties, of which 2 are
compound and nested within each other. Properties
are created randomly and are of types numeric,
interval, discrete weighted set, range set and fuzzy.
Matching 1000 cases takes approximately 400 ms.
5 CONCLUSIONS AND FUTURE
WORK
We have presented design and implementation
issues that have influenced and defined the
development of a fuzzy case-based matching engine.
We stressed the flexibility of the engine, which is
reflected in the variety of case data types on the one
hand and fuzzy set theoretical matching operations
on the other hand. We have analyzed the scalability
of the engine, and found that the engine is capable of
dealing with complex cases under increasing load
conditions. The applicability of the matching engine
is currently being investigated for e-marketplaces for
student jobs (Kurbel et al., 2001; Hansenne et al.,
2003; Van de Walle 2003(b); Hansenne et al., 2004)
and negotiation processes in electronic markets
involving complex multi-issue cases (Van de Walle
et al., 2001). We have recently developed a
theoretical model to deal with incomplete case
information and asymmetric matching processes
(Van de Walle and Van der Sluys, 2002; Van de
Walle 2003(a)), and our near term research objective
is to implement that model in the engine’s
application layer and investigate its applicability for
real world electronic markets.
REFERENCES
Aamodt, A. and E. Plaza, 1994. Case-Based Reasoning:
Foundational Issues, Methodological Variations, and
System Approaches. In AICom - Artificial Intelligence
Communications, IOS Press 7 (1), 39 – 59.
De Baets, B., M.M. Gupta and E.E. Kerre, 1989. Expert
knowledge representation by means of piecewise linear
fuzzy quantities. In Proceedings of the Third
International Fuzzy Systems Association Congress 89
(Seattle, WA, USA), 618-621.
Hansenne, R., V. Van der Sluys and B. Van de Walle,
2003. Smart Web Services in Action: Student Odd
Jobs on University Websites. In Proceedings of the
International Conference on Information Technology:
Research and Education ITRE2003 (Newark, New
Jersey USA), 255 – 256.
Hansenne, R., V. Van der Sluys and B. Van de Walle,
2004. Implementation of a web services based
recruitment platform. Submitted to WSMAI-2004, The
2
nd
International Workshop on Web Services –
Figure 5: Results of the real-world scaling tests for
the fuzzy matching engine
Figure 6: Results of the fuzzy scaling tests for the
fuzzy matching engine
DESIGN AND IMPLEMENTATION OF A SCALABLE FUZZY CASE-BASED MATCHING ENGINE
381
Modeling, Architecture and Infrastructure (Porto,
Portugal).
Klir, G. and Y. Bo, 1995. Fuzzy Sets and Fuzzy Logic:
theory and Applications. Prentice Hall, Englewood
Cliffs, NJ.
Kolodner, J., 1993. Case Based Reasoning. Morgan
Kaufmann Press.
Kurbel, K.; Loutchko I.; Klaue S.: Automated Negotiation
on Agent-Based E-Marketplaces: An Overview; in:
O'Keefe, B. et al. (Eds.): Proceedings of 14th Bled
Electronic Commerce Conference; Bled, Slovenia;
June 2001, pp. 508-519.
Pal, S., T. Dillon, and D. Yeung, (Eds.), 2001. Soft
Computing in Case Based Reasoning. London, U.K.:
Springer-Verlag.
Van de Walle, B., S. Heitsch and P. Faratin, 2001. Coping
with One-to-many Multi-criteria Negotiations in an
Electronic Marketplace. In Proceedings of the e-
negotiatons Workshop at the 17
th
International
Database and Expert Systems Applications Conference
DEXIA’01 (Munchen, Germany), 747 –751.
Van de Walle, B. and V. Van der Sluys, 2002. Non-
symmetric Matching Information for Negotiation
Support in Electronic Markets. In Proceeding of the
International Workshop on Information Systems
EuroFuse2002 (Trento, Italy), 271 – 276.
Van de Walle, B., 2003(a). A relational analysis of
decision makers’ preferences. In International Journal
of Intelligent Systems 18, 775 – 791.
Van de Walle, B., 2003(b). Relational structures for the
analysis of decision information in electronic markets.
In Applied Decision Support with Soft Computing
(Eds. X. Yu and J. Kacprzyck), Studies in Fuzziness
and Soft Computing Series Vol. 124, Springer-Verlag,
pp. 196 – 217.
Watson, I.D., 1997. Applying Case-Based Reasoning:
Techniques for Enterprise Systems. Morgan Kaufman
Publishers.
Xu, Y., E.E. Kerre, D. Ruan and Z. Song, 2001. Fuzzy
reasoning based on the extension principle. In
International Journal of Intelligent Systems 16, 469 –
495.
Zadeh, L.A., 1971. Similarity relation and fuzzy orderings.
In Information Sciences 3, 177 – 200.
ICEIS 2004 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
382