DISEASE CLASSIFICATION USING LOGISTIC REGRESSION AND MACHINE LEARNING TECHNIQUES
Keywords:
Disease Classification, Logistic Regression, Machine Learning, Heart Disease Prediction, Random Forest, Predictive Modeling, Clinical Risk AssessmentAbstract
Accurate and early disease classification plays a critical role in improving clinical decision-making and reducing mortality associated with cardiovascular disorders. The increasing availability of medical datasets and computational tools has enabled the development of robust predictive models for disease diagnosis using statistical and machine learning approaches. A comprehensive classification framework was developed using Logistic Regression and advanced machine learning techniques for heart disease prediction based on 303 patient observations and 13 clinical predictors. The analytical framework included descriptive statistics, correlation analysis, predictor ranking, logistic regression coefficient estimation, and comparative machine learning evaluation. Multiple classification algorithms, including Random Forest, Support Vector Machine, K-Nearest Neighbors, Gradient Boosting, Decision Tree, and Logistic Regression, were evaluated using performance metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. Among all models, Random Forest demonstrated the highest predictive performance, achieving an accuracy of 83.6% and ROC-AUC of 0.904, while Logistic Regression showed excellent interpretability and the highest cross-validation stability. Significant predictors included chest pain type, maximum heart rate, exercise-induced angina, oldpeak, and vessel count. The results highlight that integrating statistical inference with machine learning substantially enhances disease classification accuracy and supports reliable clinical risk assessment systems.













