A Comparative Analysis of Boosting and Transformers Models For Loan Default Risk Prediction

Authors

  • Sandi Salvan Nuraliyudin Master of Informatic, President University, Indonesia
  • Wiranto Herry Utomo Master of Informatic, President University, Indonesia

DOI:

https://doi.org/10.15575/jp.v10i1.394

Keywords:

Resiko gagal bayar Pinjaman; SMOTE; TomekLinks; Boosting; Transformer.

Abstract

Pertumbuhan pesat pasar kredit global dan nasional meningkatkan akses pembiayaan bagi konsumen dan UMKM, namun juga memperbesar risiko gagal bayar yang dapat mengancam stabilitas keuangan. Di Indonesia, rasio kredit bermasalah naik dari 2,5% (2022) menjadi 3,1% (2024), sementara perkembangan fintech lending turut memperluas risiko tersebut. Penelitian ini bertujuan membandingkan kinerja tiga algoritma boosting (XGBoost, LightGBM, CatBoost) dan model deep learning berbasis Attention (Transformer) dalam memprediksi risiko gagal bayar pinjaman. Dataset terdiri dari 255.347 baris dan 18 variabel, melalui tahap pra-pemrosesan berupa pembersihan data, penanganan nilai hilang, deteksi outlier, serta penyeimbangan kelas menggunakan SMOTE, TomekLinks, dan kombinasi keduanya. Evaluasi dilakukan menggunakan metrik Accuracy, Precision, Recall, F1-Score, dan ROC-AUC. Hasil menunjukkan model boosting memiliki akurasi tinggi (hingga 88,68% pada CatBoost dengan TomekLinks), tetapi bias terhadap kelas mayoritas. Sebaliknya, Transformer unggul pada data tidak seimbang, dengan Recall 70,22% dan F1-Score 31,49% pada SMOTE-TomekLinks. Analisis SHAP mengidentifikasi usia, suku bunga, lama bekerja, pendapatan, dan jumlah pinjaman sebagai fitur paling berpengaruh. Kesimpulannya, Transformer dengan SMOTE-TomekLinks merupakan model paling efektif dalam mendeteksi debitur berisiko gagal bayar

References

Aftab, A. I. S., & Matloob, F. (2019). Performance Analysis of Resampling Techniques on Class Imbalance Issue in Software Defect Prediction. International Journal of Information Technology and Computer Science, 11(11), 44–53. https://doi.org/10.5815/ijitcs.2019.11.05

Akinjole, A., Shobayo, O., Popoola, J., & Okoyeigbo, O. (2024). Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction.

Anande, T., Alsaadi, S., & Leeson, M. (2025). Enhanced Modelling Performance with Boosting Ensemble Meta-Learning and Optuna Optimization. SN Computer Science, 6(1). https://doi.org/10.1007/s42979-024-03544-3

AT, E., M, A., F, A.-M., & M, S. (2016). Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method. Global Journal of Technology and Optimization, 01(S1). https://doi.org/10.4172/2229-8711.s1111

Bello, O. (2024). Citation : Bello O . A . ( 2023 ) Machine Learning Algorithms for Credit Risk Assessment : An Economic and Financial Analysis Machine Learning Algorithms for Credit Risk Assessment : An Economic and Financial. 10(January 2023). https://doi.org/10.37745/ijmt.2013/vol10n1109133

Chang, V., Sivakulasingam, S., Wang, H., Wong, S. T., Ganatra, M. A., & Luo, J. (2024). Credit Risk Prediction Using Machine Learning and Deep Learning: A Study on Credit Card Customers. Risks, 12(11). https://doi.org/10.3390/risks12110174

Dube, L., & Verster, T. (2023). Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models. Data Science in Finance and Economics, 3(4), 354–379. http://www.aimspress.com/article/doi/10.3934/DSFE.2023021

Emmanuel, I., Sun, Y., & Wang, Z. (2024). A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method. Journal of Big Data, 11(1). https://doi.org/10.1186/s40537-024-00882-0

Fitch Ratings, 2025. (2025). Pasar Pinjaman Berleverage Pecahkan Rekor dengan Capaian Triliunan Dolar di Tahun 2024. Fitch Ratings, Inc. https://www.fitchratings.com/research/corporate-finance/leveraged-loan-market-breaks-records-with-trillion-dollar-milestone-in-2024-31-01-2025

Gupta, A., Pant, V., Kumar, S., & Bansal, P. K. (2024). An implementation of machine learning on loan default prediction based on customer behavior. Proceedings of the 2020 9th International Conference on System Modeling and Advancement in Research Trends, SMART 2020, 14(01), 423–426. https://doi.org/10.54209/infosains.v14i01

Hidayaturrohman, Q. A., & Hanada, E. (2024). Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure. BioMedInformatics, 4(4), 2201–2212. https://doi.org/10.3390/biomedinformatics4040118

Hu, Z. (n.d.). A Transformer-based Neural Network to Predict Credit Card Default.

Huang, H., Li, J., Zheng, C., Chen, S., Wang, X., & Chen, X. (2025). Advanced Default Risk Prediction in Small and Medum-Sized Enterprises Using Large Language Models. Applied Sciences (Switzerland), 15(5), 1–23. https://doi.org/10.3390/app15052733

Imani, M., Beikmohammadi, A., & Arabnia, H. R. (2025). Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels. Technologies, 13(3), 1–40. https://doi.org/10.3390/technologies13030088

Korangi, K., Mues, C., & Bravo, C. (2023). A transformer-based model for default prediction in mid-cap corporate markets. European Journal of Operational Research, 308(1), 306–320. https://doi.org/10.1016/j.ejor.2022.10.032

Li, S., Jin, N., Dogani, A., Yang, Y., Zhang, M., & Gu, X. (2024). Enhancing LightGBM for Industrial Fault Warning: An Innovative Hybrid Algorithm. Processes, 12(1). https://doi.org/10.3390/pr12010221

Li, Z., & Yao, L. (2024). Multi-view GCN for loan default risk prediction. Neural Computing and Applications, 36(20), 12149–12162. https://doi.org/10.1007/s00521-024-09695-x

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 2017-Decem(Section 2), 4766–4775.

Mallinguh, E., & Zoltan, Z. (2022). Financial Institution Type and Firm-Related Attributes as Determinants of Loan Amounts. Journal of Risk and Financial Management, 15(3). https://doi.org/10.3390/jrfm15030119

Mintoo, A. A. (2025). BLOCKCHAIN IN BANKING : A REVIEW OF DISTRIBUTED LEDGER APPLICATIONS IN LOAN PROCESSING , CREDIT HISTORY , AND. 04(01), 101–138. https://doi.org/10.63125/gp61va54

Nguyen, N., & Ngo, D. (2025). Comparative analysis of boosting algorithms for predicting personal default. Cogent Economics and Finance, 13(1). https://doi.org/10.1080/23322039.2025.2465971

Noriega, J. P., Rivera, L. A., & Herrera, J. A. (2023). Machine Learning for Credit Risk Prediction: A Systematic Literature Review. Data, 8(11), 1–17. https://doi.org/10.3390/data8110169

OJK. (2024). Statistik Perbankan Indonesia (Vol. 22, Issue 12).

Poernamawatie, F., Susipta, I. N., & Winarno, D. (2024). Sharia Bank of Indonesia Stock Price Prediction using Long Short-Term Memory. Journal of Economics, Finance And Management Studies, 07(07), 4777–4782. https://doi.org/10.47191/jefms/v7-i7-94

Sharpe, W. (2025). Sustainable Debt in Focus: 2024 Summary and 2025 Outlook. Natixis CIB. https://gsh.cib.natixis.com/our-center-of-expertise/articles/sustainable-debt-in-focus-2024-summary-and-2025-outlook

Soomro, A., Zakariyah, H., Aftab, S. M. A., Muflehi, M., Shah, A., & Meraj, S. (2024). Loan Default Prediction Using Machine Learning Algorithms: A Systematic Literature Review 2020–2023. Pakistan Journal of Life and Social Sciences, 22(2), 6234–6253. https://doi.org/10.57239/PJLSS-2024-22.2.00469

Sun, X. (2025). Application of an improved LightGBM hybrid integration model combining gradient harmonization and Jacobian regularization for breast cancer diagnosis. Scientific Reports, 15(1), 2569. https://doi.org/10.1038/s41598-025-86014-x

Wang, Y., Xu, Z., Yao, Y., Liu, J., & Lin, J. (2024). Leveraging Convolutional Neural Network-Transformer Synergy for Predictive Modeling in Risk-Based Applications. 2024 4th International Conference on Electronic Information Engineering and Computer Communication, EIECC 2024, 1565–1570. https://doi.org/10.1109/EIECC64539.2024.10929474

Yang, S., Huang, Z., Xiao, W., & Shen, X. (2025). Interpretable Credit Default Prediction with Ensemble Learning and SHAP. http://arxiv.org/abs/2505.20815

Zhang, J., Wang, T., Wang, B., Chen, C., & Wang, G. (2023). Hyperparameter optimization method based on dynamic Bayesian with sliding balance mechanism in neural network for cloud computing. Journal of Cloud Computing, 12(1). https://doi.org/10.1186/s13677-023-00482-y

Zhao, Z., Cui, T., Ding, S., Li, J., & Bellotti, A. G. (2024). Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction. Mathematics, 12(5), 1–27. https://doi.org/10.3390/math12050701

Downloads

Published

2026-01-20

How to Cite

Nuraliyudin, S. S., & Utomo, W. H. (2026). A Comparative Analysis of Boosting and Transformers Models For Loan Default Risk Prediction. Jurnal Perspektif, 10(1), 40–54. https://doi.org/10.15575/jp.v10i1.394

Issue

Section

Jurnal Perspektif

Citation Check