Stroke and Diabetes Prediction using Machine Learning Algorithms

Document Type : Original Article

Authors

1 AAST, Egypt.

2 Dean of the Computer Science Department, Arab Academy for Science and Technology, Egypt.

10.21608/iugrc.2021.245570

Abstract

Diabetes is a disease that has no permanent cure; hence early detection is required. It is a dreadful disease identified by escalated levels of glucose in the blood. Machine learning algorithms help in identification and prediction of diabetes at an early stage. The main objective of this study is to predict diabetes mellitus with better accuracy using an ensemble of machine learning algorithms. machine learning (ML) algorithms, and K-fold Cross Validation; Accuracy are used in Predicting Diabetes (PD) dataset in our research, collected from the Kaggle Machine Learning. The dataset contains information about 768 patients and their corresponding nine unique attributes and has been considered for experimentation, which gathers details of patients with and without having diabetes. The proposed ensemble soft voting classifier gives binary classification and uses the ensemble of three machine learning algorithms. random forest, K-Nearest Neighbors (KNN), and Naive Bayes for the classification. Empirical evaluation of the proposed methodology has been conducted with state-of-the-art methodologies and base classifiers such as K-Nearest Neighbors (KNN). by taking accuracy, precision, recall and specificity as the evaluation criteria. The proposed ensemble approach gives the highest accuracy, precision, recall and specificity value with 77.922%, 83.006%, 83,552% and 67.088% respectively on the Prediction Diabetes (PD) dataset. Further, the efficiency of the proposed methodology has also been compared and analyzed with Stroke Prediction dataset. The proposed ensemble soft voting classifier has given accuracy, precision, recall and specificity value with93.83%,92.59%,96.12% and 91.91% on Stroke Prediction dataset using Random Forest Algorithm.

Keywords