The paper is organized as follows. In Section II a
review of association rule mining is made and the
proposed RecBot agent will be presented. Section III
presents the materials and methods for evaluating
the proposal and Section IV brings an analysis of the
execution profile. The work is concluded in Section
V with a general discussion on the proposal and
future work perspectives.
2 ASSOCIATION RULE MINING
AND THE RECBOT AGENT
There are several practical applications in which the
objective is to find relationships among attributes (or
variables), not objects. The association analysis, also
known as association rule mining, corresponds to the
discovery of association rules that present attribute
values that occur concomitantly in a database
(Agrawal, Imielinski and Swami, 1993; de Castro
and Ferrari, 2016; Han, Kamber, and Pei, 2012).
This type of analysis is typically used in marketing
actions and for the study of transactional databases.
There are two central aspects in the mining of
association rules: the efficient construction of
association rules and the quantification of the
significance of the proposed rules. That is, a good
association rule mining algorithm needs to be able to
propose associations of items that are statistically
relevant to the universe represented by the database.
More formally, association rules have the form X →
Y:
A
1
and A
2
and ... and A
m
B
1
and B
2
and ... and B
n
,
where A
i
, i = 1, ..., m, and B
j
, j = 1, ..., n, are pairs of
attribute values.
The X → Y association rules are interpreted as
follows: database records that satisfy the condition
in X also satisfy the condition in Y.
The significance of the proposed rules is
established on the basis of statistical arguments.
Rules that involve mutually exclusive items or that
cover a very small number of transactions are of
little relevance. Thus, it is possible to objectively
propose measures of interest that evaluate such
features of the rules, such as support and trust
(Agrawal, Imielinski and Swami, 1993).
The support, or coverage, of a rule is an
important measure, since rules with very low
support values occur only occasionally. Rules with
low support are also of little interest from the
business perspective, since it does not make much
sense to promote items that customers buy little
together. For this reason, support is typically used to
eliminate uninteresting rules.
The support of an association rule, A → C, indicates
the frequency of occurrence of the rule, that is, the
probability of this rule being found in the total set of
transactions of the base:
where (A∪C) is the rule support count, which
corresponds to the number of transactions that
contain a particular set of items, and n is the total
number of transactions in the base.
Mathematically the support count of a set of
items A is given by:
(A) = | {t
i
| A ⊆ t
i
, t
i
∈ T} |
The confidence, or accuracy, verifies the
occurrence of the consequent part of the rule in
relation to the antecedent:
where (A) is the support count of the antecedent.
While confidence is a measure of the rule’s
accuracy, support corresponds to its statistical
significance. Together, these are the most commonly
used measures of interest in the association rule
mining literature (Al-Mudimigh and Saleem, 2008;
Al-Mudimigh, Saleem and Ullah, 2009; Han,
Kamber, and Pei, 2012; de Castro and Ferrari,
2016). During the association rule mining process,
criteria based on minimum values of support and
confidence are established so that a rule is part of the
final set of rules. However, many potentially
interesting rules can be eliminated by a minimum
support criterion, just as confidence is a measure that
ignores the support of the set of items.
One way to reduce the computational cost of
association rule mining algorithms is to decouple the
support and confidence requirements from the rules.
Because rule support only depends on the item set,
infrequent item sets can be deleted early in the
process without having to calculate their confidence.
Thus, a common strategy adopted by association
rule mining algorithms is to decompose the problem
into two subtasks: generation of the frequent itemset;
and rule generation.
Integrating an Association Rule Mining Agent in an ERP System: A Proposal and a Computational Scalability Analysis
779