A Robust Framework for Early Heart Disease Detection: Enhancing Machine Learning Models with Feature Selection and Engineering to Improve Prediction Accuracy

Muna Tabuni

A Robust Framework for Early Heart Disease Detection: Enhancing Machine Learning Models with Feature Selection and Engineering to Improve Prediction Accuracy

Date

2026-1

Type

Conference paper

Conference title

Author(s)

Muna Tabuni

Pages

22 - 37

Abstract

Heart disease remains a major global health concern, where early and accurate detection plays a crucial role in improving patient outcomes. This study explores the effectiveness of seven machine learning models—K-Nearest Neighbour (KNN), Extra Trees (ET), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Random Forest (RF), and LightGBM—in identifying heart disease. Their performance was evaluated and compared to results from a previously published study. To enhance model accuracy, we applied data normalization, selected key features using the Variance Inflation Factor (VIF), and introduced engineered features such as exercise_angina_severity (ES) and poor_heart_function_risk_factors (PHR) to improve recall. These refinements significantly boosted model performance, with KNN and SVM achieving accuracy scores of 0.89 and 0.91—substantially higher than their previous scores of 0.64 and 0.57. KNN also showed the highest recall scores (0.90 and 0.88), highlighting its potential for early disease detection. This research underscores the value of machine learning in heart disease diagnosis and demonstrates how strategic feature selection and engineering can improve classification accuracy. By leveraging AI in healthcare, this study contributes to better clinical decision-making and improved patient care.

Publisher's website

View