Machine Learning Repository (Asuncion and New-
man, 2007)
2
. These features describe characteristics
of the cell nuclei present in the image. Features were
computed from breast masses and they are classified
in benignant and malignant masses. For each of the
three cell nucleus, the following ten features were
computed: mean of distances from center to points
on the perimeter, standard deviation of gray-scale val-
ues, perimeter, area, smoothness, compactness, con-
cavity, concave points, symmetry and fractal dimen-
sion. Thus, the feature vectors have 30 features, and
the classes are distributed in 357 benignant and 212
malignant.
In the step 2, StARMiner
∗
mined 19 rules for each
class. The results using the holdout and the leave-
one-out approaches are shown in Tables 3 and 2 4,
respectively.
Table 3: Comparison etween SACMiner and other well-
known classifiers using the holdout approach.
Classifiers Accuracy Sensitivity Specificity
SACMiner 0.9859 0.9888 0.9811
1R 0.8943 0.9186 0.8571
Naive Bayes 0.9155 0.9186 0.9107
C4.5 0.9295 0.9419 0.9107
1-NN 0.9577 0.9767 0.9286
Table 4: Comparison between SACMiner and other well-
known classifiers using the leave-one-out approach.
Classifiers Accuracy Sensitivity Specificity
SACMiner 0.9525 0.9860 0.8962
1R 0.9015 0.9356 0.8443
Naive Bayes 0.9349 0.9580 0.8962
C4.5 0.9384 0.9524 0.9151
1-NN 0.9525 0.9580 0.9434
When we analyze the results using the holdout ap-
proach in Table 3, we can note that SACMiner leads
the highest values of accuracy, sensitivity and speci-
ficity. Thus, when we consider the results using the
leave-one-out approach, we also observe that the ac-
curacy measure is one of the highest, presenting the
same result that 1-NN, and leads the value of sensi-
tivity.
5 CONCLUSIONS
In this paper we proposed SACMiner, a new method
that employs statistical association rules to support
computer-aided diagnosis for breast cancer. The re-
sults of using real datasets show that the proposed
method achieves the highest values of accuracy, when
2
http://archive.ics.uci.edu/ml/datasets.html
compared with other well-known classifiers (1-R,
Naive Bayes, C4.5 and 1-NN). Moreover, the method
shows a proper balance between sensitivity and speci-
ficity, being a little bit more specific than sensitive,
what is desirable in the medical domain, since it is
more accurate to spot the true positives. Two new
algorithms were developed to support the method,
StARMiner
∗
and V-Classifier. StARMiner
∗
does not
demands the discretization step and generates a com-
pact set of rules to compose the learning model of
SACMiner. Moreover, the computational cost is low
(linear on the number of dataset items). V-classifier
is an associative classifier that works based on the
idea of classes votes. The experiments showed that
the SACMiner method produces high values of ac-
curacy, sensitivity and specificity when compared to
other traditional classifiers. In addition, SACMiner
produces rules that allow the comprehension of the
learning process, and consequently, it makes the sys-
tem more reliable to be used by the radiologists, since
they can understand the whole process of classifica-
tion.
ACKNOWLEDGEMENTS
We are thankful to CNPq, CAPES, FAPESP, Univer-
sity of S˜ao Paulo and Federal University of Rondˆonia
for the financial support.
REFERENCES
Agrawal, R., Imielinski, T., and Swami, A. N. (1993). Min-
ing association rules between sets of items in large
databases. In Proceedings of the 1993 ACM SIGMOD
ICMD, pages 207–216, Washington, D.C.
Antonie, M.-L., Zaane, O. R., and Coman, A. (2003). Asso-
ciative classifiers for medical images. In LNAI 2797,
MMCD, pages 68–83. Springer-Verlag.
Asuncion, A. and Newman, D. (2007). ”UCI machine learn-
ing repository”.
Aumann, Y. and Lindell, Y. (1999). A statistical theory
for quantitative association rules. In Press, A., edi-
tor, The fifth ACM SIGKDD international conference
on Knowledge discovery and data mining, pages 261–
270, San Diego, California, United States.
Balan, A. G. R., Traina, A. J. M., Traina Jr., C., and Mar-
ques, P. M. d. A. (2005). Fractal analysis of im-
age textures for indexing and retrieval by content. In
18th IEEE Intl. Symposium on Computer-Based Med-
ical Systems - CBMS, pages 581–586, Dublin, Ireland.
IEEE Computer Society.
Domingos, P. and Pazzani, M. (1997). On the optimality
of the simple Bayesian classifier under zero-one loss.
Machine Learning, 29(2-3):103–130.
STATISTICAL ASSOCIATIVE CLASSIFICATION OF MAMMOGRAMS - The SACMiner Method
127