A Comparative Analysis of Boosting and Transformers Models For Loan Default Risk Prediction
DOI:
https://doi.org/10.15575/jp.v10i1.394Keywords:
Resiko gagal bayar Pinjaman; SMOTE; TomekLinks; Boosting; Transformer.Abstract
Pertumbuhan pesat pasar kredit global dan nasional meningkatkan akses pembiayaan bagi konsumen dan UMKM, namun juga memperbesar risiko gagal bayar yang dapat mengancam stabilitas keuangan. Di Indonesia, rasio kredit bermasalah naik dari 2,5% (2022) menjadi 3,1% (2024), sementara perkembangan fintech lending turut memperluas risiko tersebut. Penelitian ini bertujuan membandingkan kinerja tiga algoritma boosting (XGBoost, LightGBM, CatBoost) dan model deep learning berbasis Attention (Transformer) dalam memprediksi risiko gagal bayar pinjaman. Dataset terdiri dari 255.347 baris dan 18 variabel, melalui tahap pra-pemrosesan berupa pembersihan data, penanganan nilai hilang, deteksi outlier, serta penyeimbangan kelas menggunakan SMOTE, TomekLinks, dan kombinasi keduanya. Evaluasi dilakukan menggunakan metrik Accuracy, Precision, Recall, F1-Score, dan ROC-AUC. Hasil menunjukkan model boosting memiliki akurasi tinggi (hingga 88,68% pada CatBoost dengan TomekLinks), tetapi bias terhadap kelas mayoritas. Sebaliknya, Transformer unggul pada data tidak seimbang, dengan Recall 70,22% dan F1-Score 31,49% pada SMOTE-TomekLinks. Analisis SHAP mengidentifikasi usia, suku bunga, lama bekerja, pendapatan, dan jumlah pinjaman sebagai fitur paling berpengaruh. Kesimpulannya, Transformer dengan SMOTE-TomekLinks merupakan model paling efektif dalam mendeteksi debitur berisiko gagal bayar
References
Aftab, A. I. S., & Matloob, F. (2019). Performance Analysis of Resampling Techniques on Class Imbalance Issue in Software Defect Prediction. International Journal of Information Technology and Computer Science, 11(11), 44–53. https://doi.org/10.5815/ijitcs.2019.11.05
Akinjole, A., Shobayo, O., Popoola, J., & Okoyeigbo, O. (2024). Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction.
Anande, T., Alsaadi, S., & Leeson, M. (2025). Enhanced Modelling Performance with Boosting Ensemble Meta-Learning and Optuna Optimization. SN Computer Science, 6(1). https://doi.org/10.1007/s42979-024-03544-3
AT, E., M, A., F, A.-M., & M, S. (2016). Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method. Global Journal of Technology and Optimization, 01(S1). https://doi.org/10.4172/2229-8711.s1111
Bello, O. (2024). Citation : Bello O . A . ( 2023 ) Machine Learning Algorithms for Credit Risk Assessment : An Economic and Financial Analysis Machine Learning Algorithms for Credit Risk Assessment : An Economic and Financial. 10(January 2023). https://doi.org/10.37745/ijmt.2013/vol10n1109133
Chang, V., Sivakulasingam, S., Wang, H., Wong, S. T., Ganatra, M. A., & Luo, J. (2024). Credit Risk Prediction Using Machine Learning and Deep Learning: A Study on Credit Card Customers. Risks, 12(11). https://doi.org/10.3390/risks12110174
Dube, L., & Verster, T. (2023). Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models. Data Science in Finance and Economics, 3(4), 354–379. http://www.aimspress.com/article/doi/10.3934/DSFE.2023021
Emmanuel, I., Sun, Y., & Wang, Z. (2024). A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method. Journal of Big Data, 11(1). https://doi.org/10.1186/s40537-024-00882-0
Fitch Ratings, 2025. (2025). Pasar Pinjaman Berleverage Pecahkan Rekor dengan Capaian Triliunan Dolar di Tahun 2024. Fitch Ratings, Inc. https://www.fitchratings.com/research/corporate-finance/leveraged-loan-market-breaks-records-with-trillion-dollar-milestone-in-2024-31-01-2025
Gupta, A., Pant, V., Kumar, S., & Bansal, P. K. (2024). An implementation of machine learning on loan default prediction based on customer behavior. Proceedings of the 2020 9th International Conference on System Modeling and Advancement in Research Trends, SMART 2020, 14(01), 423–426. https://doi.org/10.54209/infosains.v14i01
Hidayaturrohman, Q. A., & Hanada, E. (2024). Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure. BioMedInformatics, 4(4), 2201–2212. https://doi.org/10.3390/biomedinformatics4040118
Hu, Z. (n.d.). A Transformer-based Neural Network to Predict Credit Card Default.
Huang, H., Li, J., Zheng, C., Chen, S., Wang, X., & Chen, X. (2025). Advanced Default Risk Prediction in Small and Medum-Sized Enterprises Using Large Language Models. Applied Sciences (Switzerland), 15(5), 1–23. https://doi.org/10.3390/app15052733
Imani, M., Beikmohammadi, A., & Arabnia, H. R. (2025). Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels. Technologies, 13(3), 1–40. https://doi.org/10.3390/technologies13030088
Korangi, K., Mues, C., & Bravo, C. (2023). A transformer-based model for default prediction in mid-cap corporate markets. European Journal of Operational Research, 308(1), 306–320. https://doi.org/10.1016/j.ejor.2022.10.032
Li, S., Jin, N., Dogani, A., Yang, Y., Zhang, M., & Gu, X. (2024). Enhancing LightGBM for Industrial Fault Warning: An Innovative Hybrid Algorithm. Processes, 12(1). https://doi.org/10.3390/pr12010221
Li, Z., & Yao, L. (2024). Multi-view GCN for loan default risk prediction. Neural Computing and Applications, 36(20), 12149–12162. https://doi.org/10.1007/s00521-024-09695-x
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 2017-Decem(Section 2), 4766–4775.
Mallinguh, E., & Zoltan, Z. (2022). Financial Institution Type and Firm-Related Attributes as Determinants of Loan Amounts. Journal of Risk and Financial Management, 15(3). https://doi.org/10.3390/jrfm15030119
Mintoo, A. A. (2025). BLOCKCHAIN IN BANKING : A REVIEW OF DISTRIBUTED LEDGER APPLICATIONS IN LOAN PROCESSING , CREDIT HISTORY , AND. 04(01), 101–138. https://doi.org/10.63125/gp61va54
Nguyen, N., & Ngo, D. (2025). Comparative analysis of boosting algorithms for predicting personal default. Cogent Economics and Finance, 13(1). https://doi.org/10.1080/23322039.2025.2465971
Noriega, J. P., Rivera, L. A., & Herrera, J. A. (2023). Machine Learning for Credit Risk Prediction: A Systematic Literature Review. Data, 8(11), 1–17. https://doi.org/10.3390/data8110169
OJK. (2024). Statistik Perbankan Indonesia (Vol. 22, Issue 12).
Poernamawatie, F., Susipta, I. N., & Winarno, D. (2024). Sharia Bank of Indonesia Stock Price Prediction using Long Short-Term Memory. Journal of Economics, Finance And Management Studies, 07(07), 4777–4782. https://doi.org/10.47191/jefms/v7-i7-94
Sharpe, W. (2025). Sustainable Debt in Focus: 2024 Summary and 2025 Outlook. Natixis CIB. https://gsh.cib.natixis.com/our-center-of-expertise/articles/sustainable-debt-in-focus-2024-summary-and-2025-outlook
Soomro, A., Zakariyah, H., Aftab, S. M. A., Muflehi, M., Shah, A., & Meraj, S. (2024). Loan Default Prediction Using Machine Learning Algorithms: A Systematic Literature Review 2020–2023. Pakistan Journal of Life and Social Sciences, 22(2), 6234–6253. https://doi.org/10.57239/PJLSS-2024-22.2.00469
Sun, X. (2025). Application of an improved LightGBM hybrid integration model combining gradient harmonization and Jacobian regularization for breast cancer diagnosis. Scientific Reports, 15(1), 2569. https://doi.org/10.1038/s41598-025-86014-x
Wang, Y., Xu, Z., Yao, Y., Liu, J., & Lin, J. (2024). Leveraging Convolutional Neural Network-Transformer Synergy for Predictive Modeling in Risk-Based Applications. 2024 4th International Conference on Electronic Information Engineering and Computer Communication, EIECC 2024, 1565–1570. https://doi.org/10.1109/EIECC64539.2024.10929474
Yang, S., Huang, Z., Xiao, W., & Shen, X. (2025). Interpretable Credit Default Prediction with Ensemble Learning and SHAP. http://arxiv.org/abs/2505.20815
Zhang, J., Wang, T., Wang, B., Chen, C., & Wang, G. (2023). Hyperparameter optimization method based on dynamic Bayesian with sliding balance mechanism in neural network for cloud computing. Journal of Cloud Computing, 12(1). https://doi.org/10.1186/s13677-023-00482-y
Zhao, Z., Cui, T., Ding, S., Li, J., & Bellotti, A. G. (2024). Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction. Mathematics, 12(5), 1–27. https://doi.org/10.3390/math12050701
Downloads
Published
How to Cite
Issue
Section
Citation Check
License
Copyright (c) 2026 Sandi Salvan Nuraliyudin, Wiranto Herry Utomo

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish in Jourrnal Perspektif agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
