discretized, the algorithm ID3 appears as the ideal
candidate for solving this predictive problem.
After applying the corresponding decision
tree, some of the more interesting classification
rules that are generated based on the in depth path
of the tree are:
- The variable which in itself best serves for the
diagnosis is BI-RADS.
Example: If BI-RADS=5 (A priori maximum risk),
then SEVERITY=1 (malign). Confidence=88,4%
- If along with this variable the DENSITY variable
is considered, the average correlation with the
diagnosis improves notably.
Example: If BI-RADS=5 and DENSITY=3 (low),
then SEVERITY=1 (malign). Confidence=89,9%
- The second variable that, considered along with
BIRADS, obtains high degrees of average
correlation with the diagnosis is SHAPE.
Example: If BI-RADS=5 and SHAPE=4
(irregular), then SEVERITY=1. This rule having
Confidence=90,8%.
Besides, elimination rules are also apparent, which
are especially useful in clinical diagnosis.
Example: If BI-RADS=4 (A priori high risk) and
SHAPE=1 or 2 (round or lobular), then
SEVERITY=0 (benign). Confidence=90,7%,
90,2% respectively
In this case study, the joint consideration of
more variables does not lead to more accurate
diagnoses.
6 CONCLUSIONS
From the points presented in this paper, it can be
concluded that the most appropriate method does
not depend on the target medical speciality of the
study but on the real target of the prediction, the
nature of the data which are involved and the need
(or not) to obtain a predictive model at the end of
the process.
Although in practice the combination of two or
more methods is very frequent, the step by step
execution of the proposed procedure for the
selection of the most suitable method leads to only
one optimum predictive method.
In the face of nominal and univariable clinical
diagnostic problems (for example SEVERITY=
benign or malign), the classification rules that are
derived from the in depth route of the ID3 type
decision trees, appear as a very reliable predictive
method which is easy for experts to interpret.
Besides, this type of predictive model highlights
the combination of optimum variables and their
degree of correlation with the diagnosis, permitting
the design of more reduced analyses, which can
allow for more reduced analysis times, less
invasive or even more economical procedures
REFERENCES
Almiñana, M., Escudero, L.F., Pérez, A., Rabasa, A.,
Sánchez, C., Santamaría, L., 2008. Reducting
Classification Rule Systems Applied To Thyroid
Functional Diagnosis. Proceedings XXIV
International Biometric Conference. University
College Dublin
Block, P., Paern, J., Hüllermeier, E., Sanschagrin, P.,
Sotriffer, C., Klebe, G. , 2006. Physicochemical
Descriptors To Discriminate Protein–Protein
Interactions In Permanent And Transient Complexes
Selected By Means Of Machine Learning
Algorithms. Wiley Inter Science. Proteins:
Structure, Function, and Bioinformatics 65, 607–622
Chan, A.L., Chen, J.X., Wang, H.Y. , 2006. Application
Of Data Mining To Predict The Dosage Of
Vancomycin As An Outcome Variable In A
Teaching Hospital Population. Dustri-Verlag.
International Journal of Clinical Pharmacology and
Therapeutics 44 , 11, 533-538
Gamberger, D., Lavrac, N., Krstacic, G. , 2002.
Confirmation Rule Induction And Its Applications
To Coronary Heart Disease Diagnosis And Risk
Group Discovery. IOS Press. Journal of Intelligent
and Fuzzy Systems 12 , 1, 35-48
Gorzalczany, M.B., Gradzki, P. , 1999. Computational
Intelligence In Medical Decision Support -A
Comparison Of Two Neuro-Fuzzy Systems. Proc.
ISIE'99. Bled, Slovenia
Guler, N., Gurgen, F.S. , 2004. The Effects Of Data
Properties On Local, Piecewise, Global, Mixture Of
Experts, And Boundary-Optimized Classifiers For
Medical Decision Making. Springer-Verlag.
Computer and Information Sciences, Proc. Lecture
Notes in Computer Science 3280, 51-61
Ho, S.H., Jee, S.H., Lee, J.E., Park, J.S. , 2004. On Risk
Factors For Cervical Cancer Using Induction
Technique. Elsevier. Expert Systems with
Applications 27, 97–105
Kohli, R., Krishnamurti, R., Jedidi, K. , 2006. Subset-
Conjunctive Rules For Breast Cancer Diagnosis.
Elsevier. Discrete Applied Mathematics 154, 1100 –
1112
Kukar, M., Groselj, C. , 2005. Transductive Machine
Learning For Reliable Medical Diagnostics. Springer
Science+Business Media, Inc. Journal of Medical
Systems 29, 1
Kurgan, L.A., Cios, K.J. , 2004. CAIM Discretization
Algorithm. IEEE Computer Soc. IEEE Transactions
SELECTING THE MOST ACCURATE FORECASTING METHOD FOR MEDICAL DIAGNOSIS. BREAST CANCER
DIAGNOSIS - A Case Study
147