Authors:
David Adrian Sanders
and
Alexander Gegov
Affiliation:
University of Portsmouth, United Kingdom
Keyword(s):
User Information, Post Processing, 2-D Clusters, Data, Mining, Dead Bands, Set.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Data Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
e-Business and e-Commerce
;
Enterprise Information Systems
;
Health Information Systems
;
Information Systems Analysis and Specification
;
Knowledge Management
;
Metadata and Metamodeling
;
Multimedia and User Interfaces
;
Ontologies and the Semantic Web
;
Personalized Web Sites and Services
;
Searching and Browsing
;
Sensor Networks
;
Signal Processing
;
Social Media Analytics
;
Society, e-Business and e-Government
;
Soft Computing
;
User Modeling
;
Web Information Systems and Technologies
;
Web Interfaces and Applications
Abstract:
A post processing method is described that acts on two-dimensional clusters of data produced from a data
mining system. Dead bands are automatically created that further define the clusters. This was achieved by
defining data within the dead bands as NOT belonging to either cluster. The three clusters produced were
definitely YES, definitely NO and a new set of DON’T KNOW. The creation of the new set improved the
accuracy of decisions made about the data remaining in YES and NO clusters. The introduction of the dead
bands was achieved by either setting a radius during the learning process or by setting a straight line
boundary. Each radius (or line) was calculated during the learning process by considering the twodimensional
position of each of the users within each cluster of dimensions. A radius line (or straight line)
was then introduced so that the 80% of users within a particular dimension who were nearest to the origin
(or edge) were placed into a set. The other 20% we
re outside the radius line (or straight line) and not
recorded as being part of the set. If the two lines did not overlap, then this sometimes created a dead-band
that contained users with less certain results and that in turn increased the accuracy of the other sets. Two
case studies are presented as examples of that improvement.
(More)