MACHINE LEARNING MODELS FOR PREDICTING AT-RISK STUDENTS: A COMPARATIVE STUDY OF CLASSIFICATION TECHNIQUES
Keywords:
Machine Learning, At-Risk Students, Classification Techniques, Educational Data Mining, Predictive Analytics, Student RetentionAbstract
The increased usage of data analytics in education resulted in the creation of a large number of machine learning (ML) models that predict students at risk of academic failure or dropout. This paper gives a comparative analysis of the existing classification-based ML models applied in previous studies to determine at-risk students. It aims to assess the relative efficacy, strengths, and weaknesses of popular methods, including Logistic Regression, Decision Trees, Random Forest, Support Vector Machine (SVM), Naive Bayes, and Neural Networks. Based on the results of past empirical research, the key performance indicators in this comparison include accuracy, precision, recall, F1-score, and interpretability. It has found that more complex models, such as Random Forest and Gradient Boosting, tend to have a better predictive accuracy. In contrast, simpler models, such as Logistic Regression and Decision Trees, are still used because they are more transparent and can be easily applied in educational settings. Additionally, research emphasizes the importance of high-quality data and relevant feature selection in improving the model's reliability. On balance, this review emphasizes that there is no single model that is the best to pursue; instead, the decision will depend on the institutional goals, the nature of the data, and the degree of accuracy and interpretability. The research highlights the possibilities of ML-based early warning systems as a way of facilitating the timely delivery of academic intervention and an improved student retention approach.













