Development of heart attack prediction model based on ensemble learning

Authors

DOI:

https://doi.org/10.15587/1729-4061.2021.238528

Keywords:

heart attack prediction, machine learning, ensemble learning, stacking ensemble technique

Abstract

With the advent of the data age, the continuous improvement and widespread application of medical information systems have led to an exponential growth of biomedical data, such as medical imaging, electronic medical records, biometric tags, and clinical records that have potential and essential research value. However, medical research based on statistical methods is limited by the class and size of the research community, so it cannot effectively perform data mining for large-scale medical information. At the same time, supervised machine learning techniques can effectively solve this problem. Heart attack is one of the most common diseases and one of the leading causes of death, so finding a system that can accurately and reliably predict early diagnosis is an essential and influential step in treating such diseases. Researchers have used various data mining and machine learning techniques to analyze medical data, helping professionals predict heart disease. This paper presents various features related to heart disease, and the model is based on ensemble learning. The proposed system involves preprocessing data, selecting attributes, and then using logistic regression algorithms as meta-classifiers to build the ensemble learning model. Furthermore, using machine learning algorithms (Support Vector Machines, Decision Tree, Random Forest, Extreme Gradient Boosting) for prediction on the Framingham Heart Study dataset and compared with the proposed methodology. The results show that the feasibility and effectiveness of the proposed prediction method based on group learning provide accuracy for medical recommendations and better accuracy than the single traditional machine learning algorithm.

Supporting Agency

  • First of all, I would like to thank, Associate Professor Ibrahim Ahmed Saleh, for his meticulous care and help in my life and academics. Teacher Ibrahim has noble morals, kindness to others, rigorous scholarship, and profound knowledge. he not only taught me how to do the skills of learning have also taught me the principles of life, which will benefit me for life. At the end of this topic, I would like to extend my sincerest gratitude to Teacher Ibrahim again. Thanks to the University of Mosul, college of computer science and mathematics for their care and care for my daily experiments and life.

Author Biographies

Omar Shakir Hasan, University of Mosul

Assistant Teacher

Department of Computer Science

College of Computer Science and Mathematics

Ibrahim Ahmed Saleh, University of Mosul

Assistant Professor

Department of Software Engineering

College of Computer Science and Mathematics

References

  1. Waqar, M., Dawood, H., Dawood, H., Majeed, N., Banjar, A., Alharbey, R. (2021). An Efficient SMOTE-Based Deep Learning Model for Heart Attack Prediction. Scientific Programming, 2021, 1–12. doi: https://doi.org/10.1155/2021/6621622
  2. Muhammad, Y., Tahir, M., Hayat, M., Chong, K. T. (2020). Early and accurate detection and diagnosis of heart disease using intelligent computational model. Scientific Reports, 10 (1). doi: https://doi.org/10.1038/s41598-020-76635-9
  3. Roth, G. A., Mensah, G. A., Johnson, C. O., Addolorato, G., Ammirati, E., Baddour, L. M. et. al. (2020). Global Burden of Cardiovascular Diseases and Risk Factors, 1990–2019: Update From the GBD 2019 Study. Journal of the American College of Cardiology, 76 (25), 2982–3021. doi: https://doi.org/10.1016/j.jacc.2020.11.010
  4. Ramdurai, B. (2020). How AI (Artificial Intelligence) can improve Patient Experience in OPD (Out-Patient Dept.). doi: https://doi.org/10.13140/RG.2.2.23267.17440
  5. Keya, M. S., Shamsojjaman, M., Hossain, F., Akter, F., Islam, F., Emon, M. U. (2021). Measuring the Heart Attack Possibility using Different Types of Machine Learning Algorithms. 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS). doi: https://doi.org/10.1109/icais50930.2021.9395846
  6. Rincy, T. N., Gupta, R. (2020). Ensemble Learning Techniques and its Efficiency in Machine Learning: A Survey. 2nd International Conference on Data, Engineering and Applications (IDEA). doi: https://doi.org/10.1109/idea49133.2020.9170675
  7. Virani, S. S., Alonso, A., Aparicio, H. J., Benjamin, E. J., Bittencourt, M. S. et. al. (2021). Heart Disease and Stroke Statistics – 2021 Update. Circulation, 143 (8). doi: https://doi.org/10.1161/cir.0000000000000950
  8. Nurmamadovna, I. N. (2021). Coronary Heart Disease. The American Journal of Medical Sciences and Pharmaceutical Research, 03 (02), 31–36. doi: https://doi.org/10.37547/tajmspr/volume03issue02-04
  9. Dash, S., Shakyawar, S. K., Sharma, M., Kaushik, S. (2019). Big data in healthcare: management, analysis and future prospects. Journal of Big Data, 6 (1). doi: https://doi.org/10.1186/s40537-019-0217-0
  10. Saw, M., Saxena, T., Kaithwas, S., Yadav, R., Lal, N. (2020). Estimation of Prediction for Getting Heart Disease Using Logistic Regression Model of Machine Learning. 2020 International Conference on Computer Communication and Informatics (ICCCI). doi: https://doi.org/10.1109/iccci48352.2020.9104210
  11. Yekkala, I., Dixit, S. (2018). Prediction of Heart Disease Using Random Forest and Rough Set Based Feature Selection. International Journal of Big Data and Analytics in Healthcare, 3 (1), 1–12. doi: https://doi.org/10.4018/ijbdah.2018010101
  12. Shah, D., Patel, S., Bharti, S. K. (2020). Heart Disease Prediction using Machine Learning Techniques. SN Computer Science, 1 (6). doi: https://doi.org/10.1007/s42979-020-00365-y
  13. Kamboj, M. (2019). Heart Disease Prediction with Machine Learning Approaches. International Journal of Science and Research, 9 (7), 1454–1458. Available at: https://www.ijsr.net/get_count.php?paper_id=SR20724113128
  14. Bindhika, G. S. S., Meghana, M., Reddy, M. S., Rajalakshmi (2020). Heart Disease Prediction Using Machine Learning Techniques. International Research Journal of Engineering and Technology (IRJET), 07 (04), 5272–5276. Available at: https://www.researchgate.net/publication/344557562_Heart_Disease_Prediction_Using_Machine_Learning_Techniques
  15. Kim, J. K., Kang, S. (2017). Neural Network-Based Coronary Heart Disease Risk Prediction Using Feature Correlation Analysis. Journal of Healthcare Engineering, 2017, 1–13. doi: https://doi.org/10.1155/2017/2780501
  16. Kasbe, T., Pippal, R. S. (2017). Design of heart disease diagnosis system using fuzzy logic. 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS). doi: https://doi.org/10.1109/icecds.2017.8390044
  17. Salhi, D. E., Tari, A., Kechadi, M.-T. (2021). Using Machine Learning for Heart Disease Prediction. Lecture Notes in Networks and Systems, 70–81. doi: https://doi.org/10.1007/978-3-030-69418-0_7
  18. Kshirsagar, P. (2020). ECG Signal Analysis and Prediction of Heart Attack with the Help of Optimized Neural Network. Alochana Chakra Journal, IX (IV), 497–506. Available at: https://www.researchgate.net/publication/340599087
  19. Malavika, G., Rajathi, N., Vanitha, V., Parameswari, P. (2020). Heart Disease Prediction Using Machine Learning Algorithms. Bioscience Biotechnology Research Communications, 13 (11), 24–27. doi: https://doi.org/10.21786/bbrc/13.11/6
  20. Lee, W.‐M. (2019). Supervised Learning-Classification Using K-Nearest Neighbors (KNN). Python® Machine Learning, 205–220. doi: https://doi.org/10.1002/9781119557500.ch9
  21. Lin, A., Wu, Q., Heidari, A. A., Xu, Y., Chen, H., Geng, W. et. al. (2019). Predicting Intentions of Students for Master Programs Using a Chaos-Induced Sine Cosine-Based Fuzzy K-Nearest Neighbor Classifier. IEEE Access, 7, 67235–67248. doi: https://doi.org/10.1109/access.2019.2918026
  22. Jiang, L., Cai, Z., Wang, D., Jiang, S. (2007). Survey of Improving K-Nearest-Neighbor for Classification. Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007). doi: https://doi.org/10.1109/fskd.2007.552
  23. García, V., Mollineda, R. A., Sánchez, J. S. (2007). On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Analysis and Applications, 11 (3-4), 269–280. doi: https://doi.org/10.1007/s10044-007-0087-5
  24. Khateeb, N., Usman, M. (2017). Efficient Heart Disease Prediction System using K-Nearest Neighbor Classification Technique. Proceedings of the International Conference on Big Data and Internet of Thing - BDIOT2017. doi: https://doi.org/10.1145/3175684.3175703
  25. Hasija, Y., Chakraborty, R. (2021). Logistic Regression. Hands-On Data Science for Biologists Using Python, 183–196. doi: https://doi.org/10.1201/9781003090113-9-9
  26. Roback, P., Legler, J. (2021). Logistic Regression. Beyond Multiple Linear Regression, 151–192. doi: https://doi.org/10.1201/9780429066665-6
  27. Imamovic, D., Babovic, E., Bijedic, N. (2020). Prediction of mortality in patients with cardiovascular disease using data mining methods. 2020 19th International Symposium INFOTEH-JAHORINA (INFOTEH). doi: https://doi.org/10.1109/infoteh48170.2020.9066297
  28. Casarin, R., Facchinetti, A., Sorice, D., Tonellato, S. (2021). Decision trees and random forests*. The Essentials of Machine Learning in Finance and Accounting, 7–36. doi: https://doi.org/10.4324/9781003037903-2
  29. Singh, Y. K., Sinha, N., Singh, S. K. (2017). Heart Disease Prediction System Using Random Forest. Advances in Computing and Data Sciences, 613–623. doi: https://doi.org/10.1007/978-981-10-5427-3_63
  30. Santhi, P., Ajay, R., Harshini, D., Jamuna Sri, S. S. (2021). A Survey on Heart Attack Prediction Using Machine Learning. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12 (2). doi: https://doi.org/10.17762/turcomat.v12i2.1955
  31. Frery, J. (2019). Ensemble Learning for Extremely Imbalced Data Flows. HAL. Available at: https://tel.archives-ouvertes.fr/tel-02899943/document
  32. Pathak, S., Mishra, I., Swetapadma, A. (2018). An Assessment of Decision Tree based Classification and Regression Algorithms. 2018 3rd International Conference on Inventive Computation Technologies (ICICT). doi: https://doi.org/10.1109/icict43934.2018.9034296
  33. Kocarik Gacar, B., Deveci Kocakoç, İ. (2020). Regression Analyses or Decision Trees? Celal Bayar Üniversitesi Sosyal Bilimler Dergisi, 18 (4), 251–260. doi: https://doi.org/10.18026/cbayarsos.796172
  34. Larose, D. T., Larose, C. D. (2014). Decision Trees. Discovering Knowledge in Data, 165–186. doi: https://doi.org/10.1002/9781118874059.ch8
  35. Hasija, Y., Chakraborty, R. (2021). Decision Trees and Random Forests. Hands-On Data Science for Biologists Using Python, 209–217. doi: https://doi.org/10.1201/9781003090113-11-11
  36. Thomas, T., Vijayaraghavan, A. P., Emmanuel, S. (2020). Applications of Decision Trees. Machine Learning Approaches in Cyber Security Analytics, 157–184. doi: https://doi.org/10.1007/978-981-15-1706-8_9
  37. Larose, C. D., Larose, D. T. (2019). Decision trees. Data Science Using Python and R, 81–96. doi: https://doi.org/10.1002/9781119526865.ch6
  38. Suthaharan, S. (2016). Decision Tree Learning. Integrated Series in Information Systems, 237–269. doi: https://doi.org/10.1007/978-1-4899-7641-3_10
  39. Mrva, J., Neupauer, S., Hudec, L., Sevcech, J., Kapec, P. (2019). Decision Support in Medical Data Using 3D Decision Tree Visualisation. 2019 E-Health and Bioengineering Conference (EHB). doi: https://doi.org/10.1109/ehb47216.2019.8969926
  40. Alsaleem, M. Y. A., Hasoon, S. O. (2020). Comparison of DT& GBDT algorithms for predictive modeling of currency exchange rates. EUREKA: Physics and Engineering, 1, 56–61. doi: https://doi.org/10.21303/2461-4262.2020.001132
  41. Perros, H. G. (2021). Support Vector Machines. An Introduction to IoT Analytics, 279–302. doi: https://doi.org/10.1201/9781003139041-11
  42. Nalepa, J., Kawulok, M. (2018). Selecting training sets for support vector machines: a review. Artificial Intelligence Review, 52 (2), 857–900. doi: https://doi.org/10.1007/s10462-017-9611-1
  43. Vamshi Kumar, S., Rajinikanth, T. V., Viswanadha Raju, S. (2021). Heart Attack Classification Using SVM with LDA and PCA Linear Transformation Techniques. Algorithms for Intelligent Systems, 99–112. doi: https://doi.org/10.1007/978-981-33-4046-6_10
  44. Kaestner, C. A. A. (2013). Support Vector Machines and Kernel Functions for Text Processing. Revista de Informática Teórica e Aplicada, 20 (3), 130. doi: https://doi.org/10.22456/2175-2745.39702
  45. Powers, D. M. W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, 2 (1). 37–63. Available at: https://www.researchgate.net/publication/276412348_Evaluation_From_precision_recall_and_F-measure_to_ROC_informedness_markedness_correlation
  46. Alsaleem, M., Hasoon, S. (2020). Predicting Bank Loan Risks Using Machine Learning Algorithms. AL-Rafidain Journal of Computer Sciences and Mathematics, 14 (1), 159–168. doi: https://doi.org/10.33899/csmj.2020.164686
  47. Gupta, A., Tatbul, N., Marcus, R., Zhou, S., Lee, I., Gottschlich, J. (2020). Class-Weighted Evaluation Metrics for Imbalanced Data Classification. arXiv.org. Available at: https://arxiv.org/pdf/2010.05995.pdf
  48. Cutler, J., Dickenson, M. (2020). Introduction to Machine Learning with Python. Computational Frameworks for Political and Social Research with Python, 129–142. doi: https://doi.org/10.1007/978-3-030-36826-5_10
  49. Gneiting, T., Vogel, P. (2018). Receiver Operating Characteristic (ROC) Curves. arXiv.org. Available at: https://arxiv.org/pdf/1809.04808.pdf
  50. Piegorsch, W. W. (2020). Confusion Matrix. Wiley StatsRef: Statistics Reference Online, 1–4. doi: https://doi.org/10.1002/9781118445112.stat08244
  51. Vasudev, R. A., Anitha, B., Manikandan, G., Karthikeyan, B., Ravi, L., Subramaniyaswamy, V. (2020). Heart disease prediction using stacked ensemble technique. Journal of Intelligent & Fuzzy Systems, 39 (6), 8249–8257. doi: https://doi.org/10.3233/jifs-189145
  52. Ravi, S., Sambath, D. M., Thangakumar, D. J., Kumar, D., Naveen, G., Bramiah, M. (2021). Prediction of Heart Disease Using Machine Learning Algorithms. Alinteri Journal of Agriculture Sciences, 36 (1), 260–264. doi: https://doi.org/10.47059/alinteri/v36i1/ajas21039
  53. Zhang, Y., Diao, L., Ma, L. (2021). Logistic Regression Models in Predicting Heart Disease. Journal of Physics: Conference Series, 1769, 012024. doi: https://doi.org/10.1088/1742-6596/1769/1/012024
  54. Yadav, K. K., Sharma, A., Badholia, A. (2021). Heart disease prediction using machine learning techniques. Information technology in industry, 9 (1), 207–214. doi: https://doi.org/10.17762/itii.v9i1.120
  55. Glienke, J. S. (2020). Life and death: Quantifying the risk of heart disease with machine learning. Honors Program Theses, 415. Available at: https://scholarworks.uni.edu/hpt/415
  56. Latifah, F. A., Slamet, I., Sugiyanto (2020). Comparison of heart disease classification with logistic regression algorithm and random forest algorithm. International Conference on Science and Applied Science (ICSAS2020). doi: https://doi.org/10.1063/5.0030579
  57. Mienye, I. D., Sun, Y., Wang, Z. (2020). Improved sparse autoencoder based artificial neural network approach for prediction of heart disease. Informatics in Medicine Unlocked, 18, 100307. doi: https://doi.org/10.1016/j.imu.2020.100307
  58. Chauhan, Y. J. (2020). Cardiovascular Disease Prediction using Classification Algorithms of Machine Learning. International Journal of Science and Research (IJSR), 9 (5), 194–200. Available at: https://www.researchgate.net/publication/341235098
  59. Kuruvilla, A. M., Balaji, N. V. (2021). Heart disease prediction system using Correlation Based Feature Selection with Multilayer Perceptron approach. IOP Conference Series: Materials Science and Engineering, 1085 (1), 012028. doi: https://doi.org/10.1088/1757-899x/1085/1/012028
  60. Zaker, N. A., Alsaleem, N., Kashmoola, M. A. (2018). Multi-agent Models Solution to Achieve EMC In Wireless Telecommunication Systems. 2018 1st Annual International Conference on Information and Sciences (AiCIS). doi: https://doi.org/10.1109/aicis.2018.00061
  61. Kashmoola, M. A., Alsaleem, M. Y. anad, Alsaleem, N. Y. A., Moskalets, M. (2019). Model of dynamics of the grouping states of radio electronic means in the problems of ensuring electromagnetic compatibility. Eastern-European Journal of Enterprise Technologies, 6 (9 (102)), 12–20. doi: https://doi.org/10.15587/1729-4061.2019.188976
  62. Ahmed, M. K., Aziz, S. F., Alsaleem, N. Y. A., Sielivanov, K., Moskalets, M. (2020). Method for determining the responses from a non-linear system using the Volterra series. Eastern-European Journal of Enterprise Technologies, 4 (9 (106)), 34–44. doi: https://doi.org/10.15587/1729-4061.2020.210754

Downloads

Published

2021-08-31

How to Cite

Hasan, O. S., & Saleh, I. A. (2021). Development of heart attack prediction model based on ensemble learning. Eastern-European Journal of Enterprise Technologies, 4(2(112), 26–34. https://doi.org/10.15587/1729-4061.2021.238528