Bias-Aware Machine Learning for Student Dropout Prediction: Balancing Accuracy and Fairness

Olufunke Catherine Olayemi; Olayemi Oladimeji  Olasehinde; Olugbenga Olawale Akinade

doi:10.70112/ajeat-2025.14.2.4330

Authors

Olufunke Catherine Olayemi Department of Computer Science, Teesside University, Middlesborough, United Kingdom
Olayemi Oladimeji Olasehinde Department of Computer Science, University of Huddersfield, Huddersfield, United Kingdom
Olugbenga Olawale Akinade Department of Computer Science, Teesside University, Middlesborough, United Kingdom

DOI:

https://doi.org/10.70112/ajeat-2025.14.2.4330

Keywords:

Student Dropout, Machine Learning, Fairness, Bias Mitigation, Early Warning Systems

Abstract

Student dropout remains a persistent global challenge with serious social and economic consequences. Early identification of at-risk learners enables timely support, which can improve retention while promoting fairness in educational outcomes. This study presents a bias-aware machine learning framework for student dropout prediction that jointly evaluates predictive performance and fairness across demographic subgroups. Six machine learning models are benchmarked using academic, demographic, and socioeconomic features. Model performance is assessed using Accuracy, F1-score, Precision, and Matthews Correlation Coefficient, while fairness is evaluated across gender, marital status, and displacement groups. Initial results show that CatBoost achieves the strongest overall performance before class balancing; however, subgroup analysis reveals systematic disparities affecting vulnerable populations. To address these biases, the Synthetic Minority Oversampling Technique is applied. After rebalancing, XGBoost delivers the best performance, achieving substantial improvements in predictive accuracy alongside marked reductions in subgroup disparities. In particular, dropout detection for displaced students improves significantly, narrowing fairness gaps across all evaluated groups. The findings demonstrate that data-level bias mitigation can enhance both accuracy and equity in educational predictive systems. This work provides empirical evidence that fairness-aware machine learning can support more reliable and inclusive early warning systems for student retention.

References

[1] E. Kučak, M. Peršić, and N. Vučković, “Predictive modeling in education: Machine learning applications for student performance and retention,” Educ. Inf. Technol., vol. 28, pp. 155–172, 2023.

[2] M. Costa-Mendes, J. Sousa, and C. Lopes, “Fairness and interpretability in educational data mining: A case study on student performance prediction,” Comput. Educ.: Artif. Intell., vol. 3, 2022, doi: 10.1016/j.caeai.2022.100094.

[3] P. Raftopoulos, G. Papadopoulos, and A. Tefas, “Mitigating algorithmic bias through data resampling and reweighting strategies,” Expert Syst. Appl., vol. 210, 2023, doi: 10.1016/j.eswa.2022.118351.

[4] M. Pham, K. Nguyen, and T. Tran, “FAIREDU: Fair regression for education data using bias-aware optimization,” IEEE Access, vol. 12, pp. 98561–98574, 2024, doi: 10.1109/ACCESS.2024.3459821.

[5] B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating unwanted biases with adversarial learning,” in Proc. AAAI/ACM Conf. AI, Ethics, and Society (AIES), pp. 335–340, 2018, doi: 10.1145/3278721.3278779.

[6] F. Kamiran and T. Calders, “Data preprocessing techniques for classification without discrimination,” Knowl. Inf. Syst., vol. 33, no. 1, pp. 1–33, 2012, doi: 10.1007/s10115-011-0463-8.

[7] I. Kusner, J. Loftus, C. Russell, and R. Silva, “Counterfactual fairness,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), pp. 4066–4076, 2017.

[8] S. Madras, E. Creager, T. Pitassi, and R. Zemel, “Fairness through causal awareness: Applications in educational analytics,” arXiv preprint arXiv:2002.08570, 2020.

[9] V. Realinho, J. Machado, L. Baptista, and M. V. Martins, “Predicting student dropout and academic success,” Data, vol. 7, no. 11, p. 146, 2022, doi: 10.3390/data7110146.

[10] C. Schaffer, “Selecting a classification method by cross-validation,” Mach. Learn., vol. 13, pp. 135–143, 1993, doi: 10.1007/BF00993106.

[11] A. Fernández, S. García, F. Herrera, and N. V. Chawla, “SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary,” J. Artif. Intell. Res., vol. 61, pp. 863–905, 2018, doi: 10.1613/jair.1.11192.

[12] T. Fawcett, “An introduction to ROC analysis,” Pattern Recognit. Lett., vol. 27, no. 8, pp. 861–874, 2006, doi: 10.1016/j.patrec.2005.10.010.

[13] D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 6, pp. 1–13, 2020, doi: 10.1186/s12864-019-6413-7.

[14] B. J. Baldi, S. Brunak, Y. Chauvin, C. A. Andersen, and H. Nielsen, “Assessing the accuracy of prediction algorithms for classification: An overview,” Bioinformatics, vol. 16, no. 5, pp. 412–424, 2000, doi: 10.1093/bioinformatics/16.5.412.

[15] M. Hardt, E. Price, and N. Srebro, “Equality of opportunity in supervised learning,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 29, pp. 3315–3323, 2016.

[16] A. Chouldechova, “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments,” Big Data, vol. 5, no. 2, pp. 153–163, 2017, doi: 10.1089/big.2016.0047.

Bias-Aware Machine Learning for Student Dropout Prediction: Balancing Accuracy and Fairness

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Announcement

Open Access

Abstracting and Indexing

Join as Reviewer

Make a Submission

Information