Abstract
Heart disease remains a major global health concern, where early and accurate detection plays a crucial role in improving patient outcomes. This study explores the effectiveness of seven machine learning models—K-Nearest Neighbour (KNN), Extra Trees (ET), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Random Forest (RF), and LightGBM—in identifying heart disease. Their performance was evaluated and compared to results from a previously published study. To enhance model accuracy, we applied data normalization, selected key features using the Variance Inflation Factor (VIF), and introduced engineered features such as exercise_angina_severity (ES) and poor_heart_function_risk_factors (PHR) to improve recall. These refinements significantly boosted model performance, with KNN and SVM achieving accuracy scores of 0.89 and 0.91—substantially higher than their previous scores of 0.64 and 0.57. KNN also showed the highest recall scores (0.90 and 0.88), highlighting its potential for early disease detection. This research underscores the value of machine learning in heart disease diagnosis and demonstrates how strategic feature selection and engineering can improve classification accuracy. By leveraging AI in healthcare, this study contributes to better clinical decision-making and improved patient care.
