models regularities or trends for objects whose
behavior changes over time. Although this may
include characterization, discrimination, association
and correlation analysis, classification, prediction, or
clustering of time-related data, distinct features of
such an analysis include time-series data analysis,
sequence or periodicity pattern matching, and
similarity-based data analysis.
5. Outlier Analysis
Which can be the opposite of the Frequent Pattern
mining. A database may contain data objects that do
not comply with the general behavior or model of the
data. These data objects are outliers. Most data
mining methods discard outliers as noise or
exceptions. However, in some applications such as
fraud detection, the rare events can be more
interesting than the more regularly occurring ones.
The analysis of outlier data is referred to as outlier
mining.
6. Mining Frequent Patterns
The Frequent Patterns is the most frequent items that
are occurring in a database. There are many kinds of
frequent patterns, including item-sets, subsequences,
and substructures. A frequent item-set typically
refers to a set of items that frequently appear together
in a transactional data set, such as milk and bread. A
frequently occurring subsequence, such as the pattern
that customers tend to purchase first a PC, followed
by a digital camera, and then a memory card, is a
(frequent) sequential pattern. A substructure can refer
to different structural forms, such as graphs, trees, or
lattices, which may be combined with item-sets or
subsequences. If a substructure occurs frequently, it
is called a (frequent) structured pattern. Mining
frequent patterns leads to the discovery of interesting
associations and correlations within data. A lot of
changes can be done relaying of the frequent patterns
studies that can benefit the field of interest.
Our concern here in this paper is to explore the
Data Mining most famous algorithm (Apriori
Algorithm) that is designed to find the most frequent
item set in a database and also to find the relation
between them which will lead us to better ways in
decision making.
3 MINING THE FREQUENT
PATTERNS
When we are studying the relationships or the
associations between the Frequent Patterns or the
Frequent Items there are 2 important concepts we
need to understand: Support & Confidence.
An objective measure for association rules of the
form X Y is rule support, representing the
percentage of transactions from a transaction
database that the given rule satisfies. This is taken to
be the probability P(XY ), where XY indicates
that a transaction contains both X and Y , that is, the
union of item-sets X and Y .
Another objective measure for association rules is
confidence, which assesses the degree of certainty of
the detected association. This is taken to be the
conditional probability P(Y|X ), that is, the
probability that a transaction containing X also
contains Y. More formally, support and confidence
are defined as:
support(XY ) = P(XY ).
confidence(XY ) = P(Y |X ).
Example: If we are doing an Association analysis
for an electronic store, suppose that the Manager
would like to determine which items are frequently
purchased together within the same transactions. An
example of such a rule, mined from the store
transactional database, is
buys(X , “computer”) buys(X , “software”)
[support = 1%, confidence = 50%]
Where X is a variable representing a customer. A
confidence, or certainty, of 50% means that if a
customer buys a computer, there is a 50% chance that
she/he will buy software as well. A 1% support
means that 1% of all of the transactions under
analysis showed that computer and software were
purchased together.
Dropping the predicate notation, the above rule can
be written simply as
“computer software [1%, 50%]”
This association rule involves a single attribute or
predicate (i.e., buys) that repeats. Association rules
that contain a single predicate are referred to as
Single-Dimensional Association Rules.
On the other hand; there is another kind of
association which we call the Multidimensional
Association Rule, which involve more than one
attribute in the frequent items relation. Example: For
the same pervious electronic store a data mining
system may find association rules like
age(X , “20...29”) ^ income(X , “20K...29K”)
buys(X , “CD player”)
[support = 2%, confidence = 60%]
The rule indicates that of the store customers under
study, 2% are 20 to 29 years of age with an income
of 20,000 to 29,000 and have purchased a CD player.
FindingtheFrequentPatterninaDatabase-AStudyontheAprioriAlgorithm
389