Department of Computer Science, Nnamdi Azikiwe University, Nigeria.
World Journal of Advanced Engineering Technology and Sciences, 2026, 19(01), 010-021
Article DOI: 10.30574/wjaets.2026.19.1.0192
Received on 24 February 2026; revised on 29 March 2026; accepted on 01 April 2026
Student dropout represents a persistent and costly challenge in higher education, undermining institutional effectiveness and limiting individual socioeconomic mobility. This study developed and evaluated a machine learning-based framework for early dropout prediction using a dataset of 10,000 university students across 19 demographic, academic, and psychosocial variables. Four classification algorithms, Logistic Regression, Random Forest, XGBoost, and an Artificial Neural Network were trained and evaluated under two conditions: using the original class-imbalanced data and using training data balanced through the Synthetic Minority Over-sampling Technique (SMOTE). A stratified 80-20 train-test split was applied, and model performance was assessed using F1-Score as the primary metric, supplemented by precision, recall, specificity, ROC-AUC, and PR-AUC. Exploratory data analysis revealed that academic performance variables particularly GPA (r = −0.460), Semester_GPA (r = −0.445), and CGPA (r = −0.445) were the strongest predictors of dropout, while demographic variables including gender and age exhibited negligible predictive value. Logistic Regression trained on SMOTE-balanced data achieved the highest overall performance (F1 = 0.5791, Recall = 0.7537, ROC-AUC = 0.8188), outperforming more complex ensemble and deep learning models across primary metrics. SMOTE consistently improved minority class detection across three of four algorithms, with XGBoost representing a notable exception where balancing marginally degraded performance. These findings demonstrate that interpretable linear models can match or exceed complex architectures in structured educational datasets, and that academically focused early warning systems offer greater intervention utility than demographically targeted approaches.
Student dropout prediction; Machine learning; Class imbalance; SMOTE; Logistic regression; Random forest; XGBoost; Artificial neural network; F1-score; Academic performance; Early warning systems; Higher education retention
Get Your e Certificate of Publication using below link
Preview Article PDF
Belonwu Tochukwu Sunday, Chukwuogo Okwuchukwu Ejike, Ezuruka, elyn Ogochukwu and Okechukwu Ogochukwu Patience. Beyond demographics: Identifying academically at-risk students through interpretable machine learning and synthetic minority oversampling. World Journal of Advanced Engineering Technology and Sciences, 2026, 19(01), 010-021