Data Mining based Methodologies for Cardiac Risk Patterns Identification

V. G. Almeida, J. Borba, T. Pereira, H. C. Pereira, J. Cardoso, C. Correia


Cardiovascular diseases (CVDs) are the leading cause of death in the world. The pulse wave analysis provides a new insight in the analysis of these pathologies, while data mining techniques can contribute for an efficient diagnostic method. Amongst the various available techniques, artificial neural networks (ANNs) are well established in biomedical applications and have numerous successful classification applications. Also, clustering procedures have proven to be very useful in assessing different risk groups in terms of cardiovascular function in healthy populations. In this paper, a robust data mining approach was performed for cardiac risk patterns identification. Eight classifiers were tested: C4.5, Random Forest, RIPPER, Naïve Bayes, Bayesian Network, Multy-layer perceptron (MLP) (1 and 2-hidden layers) and radial basis function (RBF). As for clustering procedures, k-means clustering (using Euclidean distance) and expectation-maximization (EM) were the chosen algorithms. Two datasets were used as case studies to perform classification and clustering analysis. The accuracy values are good with intervals between 88.05% and 97.15%. The clustering techniques were essential in the analysis of a dataset where little information was available, allowing the identification of different clusters that represent different risk group in terms cardiovascular function. The three cluster analysis has allowed the characterization of distinctive features for each of the clusters. Reflected wave time (T_RP) and systolic wave time (T_SP) were the selected features for clusters visualization. Data mining methodologies have proven their usefulness in screening studies due to its descriptive and predictive power.


  1. Almeida, V. G., Pereira, H. C., Pereira, T., Figueiras, E., Borges, E., Cardoso, J. M. R., and Correia, C. (2011a). Piezoelectric probe for pressure waveform estimation in flexible tubes and its application to the cardiovascular system. Sensors and Actuators A: Physical, 169(1):217-226.
  2. Almeida, V. G., Pereira, T., Borges, E., Pereira, H. C., Cardoso, J. M. R., and Correia, C. (2011b). A real time cardiac monitoring system-arterial pressure waveform capture and analysis. Proceedings of the PECCS 2011. Algarve, Portugal.
  3. Almeida, V. G., Santos, P., Figueiras, E., Borges, E., Pereira, T., Pereira, H. C., Cardoso, J. M. R., and Correia, C. (2011c). Hemodynamic features extraction from a new arterial pressure waveform probe. Proceedings of the BIOSTEC (BIOSIGNALS 2011). Rome, Italy.
  4. Avolio, A. P., Butlin, M., and Walsh, A. (2010). Arterial blood pressure measurement and pulse wave analysis-their role in enhancing cardiovascular assessment. Physiol Meas, 31(1):R1-47. Avolio, Alberto P Butlin, Mark Walsh, Andrew England Physiol Meas. 2010 Jan;31(1):R1-47. Epub 2009 Nov 26.
  5. Bortel, L. M. V., Duprez, D., Starmans-Kool, M. J., Safar, M. E., Giannattasio, C., Cockcroft, J., Kaiser, D. R., and Thuillez, C. (2002). Clinical applications of arterial stiffness, task force iii: Recommendations for user procedures. AJH, 15:445-452.
  6. Breiman, L. (2001). Random Forests. Machine Learning, 45:5-32.
  7. Clemente, F., Arpaia, P., and Cimmino, P. (2010). A piezo-film-based measurement system for global haemodynamic assessment. Physiol Meas, 31(5):697-714. Clemente, Fabrizio Arpaia, Pasquale Cimmino, Pasquale England Physiol Meas. 2010 May;31(5):697-714. Epub 2010 Apr 16.
  8. Gorunescu, F., Gorunescu, M., Saftoiu, A., Vilmann, P., and Belciug, S. (2011). Competitive/collaborative neural computing system for medical diagnosis in pancreatic cancer detection. Expert Systems, 28(1):33-48.
  9. Han, J. and Kamber, M. (2006). Data Mining:Concept and Techniques. Elsevier, San Francisco.
  10. Haykin, S. (1998). Neural Networks - A Comprehensive Foundation. Pearson education, India.
  11. Jovic, A. and Bogunovic, N. (2011). Electrocardiogram analysis using a combination of statistical, geometric, and nonlinear heart rate variability features. Artif Intell Med, 51(3):175-86. Jovic, Alan Bogunovic, Nikola Netherlands Artif Intell Med. 2011 Mar;51(3):175-86. Epub 2010 Oct 25.
  12. Kotsiantis, S. B. (2007). Supervised machine learning: a review of classification techniques. Informatica, 31:249-268.
  13. Laurent, S., e. a. (2006). Expert consensus document on arterial stiffness:methodological issues and clinical applications. European Heart Journal, 27:2588-2605.
  14. Mendis, S. e. a. (2011). Global Atlas on Cardiovascular Disease Prevention and Control. World Health Organization, Geneva.
  15. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, California.
  16. Shah, A. S., Dolan, L. M., Gao, Z., Kimball, T. R., and Urbina, E. M. (2011). Clustering of risk factors: a simple method of detecting cardiovascular disease in youth. Pediatrics, 127(2):e312-8.
  17. Tsipouras, M. G., ThemisP.Exarchos, Fotiadis, D. I., Kotsia, A. P., Vakalis, K. V., Naka, K. K., and Michalis, L. K. (2008). Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling. IEEE Transactions On Information Technology In Biomedicine.
  18. WEKA (2012). Weka 3: Data mining software in java.

Paper Citation

in Harvard Style

G. Almeida V., Borba J., Pereira T., C. Pereira H., Cardoso J. and Correia C. (2013). Data Mining based Methodologies for Cardiac Risk Patterns Identification . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013) ISBN 978-989-8565-35-8, pages 127-133. DOI: 10.5220/0004222701270133

in Bibtex Style

author={V. G. Almeida and J. Borba and T. Pereira and H. C. Pereira and J. Cardoso and C. Correia},
title={Data Mining based Methodologies for Cardiac Risk Patterns Identification},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)},

in EndNote Style

JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2013)
TI - Data Mining based Methodologies for Cardiac Risk Patterns Identification
SN - 978-989-8565-35-8
AU - G. Almeida V.
AU - Borba J.
AU - Pereira T.
AU - C. Pereira H.
AU - Cardoso J.
AU - Correia C.
PY - 2013
SP - 127
EP - 133
DO - 10.5220/0004222701270133