Comparing Five Machine Learning-Based Regression Models for Predicting the Study Period of Mathematics Students at IPB University

Authors

  • Sri Nurdiati Department of Mathematics, IPB University
  • Mohamad Khoirun Najib Department of Mathematics, IPB University

DOI:

https://doi.org/10.31764/jtam.v6i3.8408

Keywords:

Cross-Validation, Hyperparameter, Machine Learning, Multiple Linear Regression, Tuning Constant

Abstract

Grade point average (GPA) is initial information for supervisors to characterize their supervised students. One model that can be used to predict a student's study period based on GPA is a machine learning-based regression model so that supervisors can apply the right strategy for their students. Therefore, this study aims to implement and select a machine learning-based regression model to predict a student's study period based on GPA in semesters 1-6. Several regression models used are least-square regression, ridge regression, Huber regression, quantile regression, and quantile regression with l_2-regularization provided by Machine Learning in Julia (MLJ). The model is evaluated and selected based on several criteria such as maximum error, RMSE, and MAPE. The results showed that the least-square regression model gave the worst evaluation results, although the calculation method was easy and fast. Meanwhile, the quantile regression model provided the best evaluation results. The quantile regression model without regularization gives the smallest RMSE (2.31 months) and MAPE (3.56%), while the quantile regression model with l_2-regularization has a better maximum error (4.9 months). The resulting model can be used by supervisors to predict the study period of their supervised students so that supervisors can characterize their students and can design appropriate strategies. Thus, the student's study period is expected to be accelerated with a high-quality final project.

References

Aggarwal, A., & Toshniwal, D. (2021). A hybrid deep learning framework for urban air quality forecasting. Journal of Cleaner Production, 329. https://doi.org/10.1016/j.jclepro.2021.129660

Alzubi, J., Nayyar, A., & Kumar, A. (2018). Machine Learning from Theory to Algorithms: An Overview. Journal of Physics: Conference Series, 1142(1). https://doi.org/10.1088/1742-6596/1142/1/012012

An, S., Liu, W., & Venkatesh, S. (2007). Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recognition, 40(8), 2154–2162. https://doi.org/10.1016/j.patcog.2006.12.015

Anugerah, A. S. P., Indriati, & Dewi, C. (2017). Implementasi Algoritme Fuzzy K-Nearest Neighbor untuk Penentuan Lulus Tepat Waktu (Studi Kasus: Fakultas Ilmu Komputer Universitas Brawijaya). Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 2(4), 1726–1732.

Ardhana, N. K. K., Nurdiati, S., Najib, M. K., & Mukrim, S. A. (2022). Akurasi dan Efisiensi Solusi Persamaan Diferensial Biasa Dengan Masalah Nilai Batas Pada Julia dan Octave. Jurnal Matematika UNAND, 11(1), 32–46. https://doi.org/10.25077/jmu.11.1.32-46.2022

Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM Review, 59(1), 65–98. https://doi.org/10.1137/141000671

Blaom, A., Kiraly, F., Lienart, T., Simillides, Y., Arenas, D., & Vollmer, S. (2020). MLJ: A Julia package for composable machine learning. Journal of Open Source Software, 5(55), 2704. https://doi.org/10.21105/joss.02704

Cahyani, T. B. N., Wigena, A. H., & Djuraidah, A. (2016). Quantile regression with elastic-net in statistical downscaling to predict extreme rainfall. Global Journal of Pure and Applied Mathematics, 12(4), 3517–3524.

Campaigne, H. (1959). Some experiments in machine learning. Proceedings of the Western Joint Computer Conference, IRE-AIEE-ACM 1959, 173–175. https://doi.org/10.1145/1457838.1457868

Davino, C., Romano, R., & Vistocco, D. (2022). Handling multicollinearity in quantile regression through the use of principal component regression. Metron. https://doi.org/10.1007/s40300-022-00230-3

Dey, A. (2016). Machine Learning Algorithms: A Review. International Journal of Computer Science and Information Technologies, 7(3), 1174–1179.

Gao, K., Mei, G., Piccialli, F., Cuomo, S., Tu, J., & Huo, Z. (2020). Julia language in machine learning: Algorithms, applications, and open issues. Computer Science Review, 37. https://doi.org/10.1016/j.cosrev.2020.100254

Heath, M. T. (2002). Scientific Computing: An Introduction to Survey (2nd ed.). McGraw-Hill.

Huber, P. J. (1964). Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1), 73–101. https://doi.org/10.1214/aoms/1177703732

Jalal, D., & Ezzedine, T. (2019). Performance analysis of machine learning algorithms for water quality monitoring system. 2019 International Conference on Internet of Things, Embedded Systems and Communications, IINTEC 2019 - Proceedings, 86–89. https://doi.org/10.1109/IINTEC48298.2019.9112096

Johnson, M. L., & Faunt, L. M. (1992). Parameter estimation by least-squares methods. Methods in Enzymology, 210, 1–37. https://doi.org/10.1016/0076-6879(92)10003-V

Jones, T. A. (1972). Multiple regression with correlated independent variables. Journal of the International Association for Mathematical Geology, 4(3), 203–218. https://doi.org/10.1007/bf02311718

Joshi, A., & Lakhanpal, R. (2017). Learning Julia: Build high-performance applications for scientic computing. Packt Publishing Ltd.

Koenker, R., & Bassett, G. (1978). Regression Quantiles. Econometrica, 46(1), 33. https://doi.org/10.2307/1913643

Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Informatica (Ljubljana), 31(3), 249–268.

Lejeune, M. G., & Sarda, P. (1988). Quantile regression: a nonparametric approach. Computational Statistics and Data Analysis, 6(3), 229–239. https://doi.org/10.1016/0167-9473(88)90003-5

Li, Z. (2021). Research on signal modulation based on machine learning intelligent algorithm and computer automatic identification. Journal of Physics: Conference Series, 2083(4). https://doi.org/10.1088/1742-6596/2083/4/042092

Liu, T., Wang, Z., Zeng, J., & Wang, J. (2021). Machine-learning-based models to predict shear transfer strength of concrete joints. Engineering Structures, 249. https://doi.org/10.1016/j.engstruct.2021.113253

Lyu, Z., Yu, Y., Samali, B., Rashidi, M., Mohammadi, M., Nguyen, T. N., & Nguyen, A. (2022). Back-Propagation Neural Network Optimized by K-Fold Cross-Validation for Prediction of Torsional Strength of Reinforced Concrete Beam. Materials, 15(4). https://doi.org/10.3390/ma15041477

Marquardt, D. W. (1970). Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation. Technometrics, 12(3), 591. https://doi.org/10.2307/1267205

Martens, H. H. (1959). Two notes on machine “Learning.†Information and Control, 2(4), 364–379. https://doi.org/10.1016/S0019-9958(59)80014-0

Masykuri, W. S., Khatizah, E., & Bukhari, F. (2021). The application of perceptron method in predicting student graduation based on several identified key factors. IOP Conference Series: Earth and Environmental Science, 1796(1). https://doi.org/10.1088/1742-6596/1796/1/012060

Purwanto, E., Kusrini, K., & Sudarmawan, S. (2019). Prediksi Kelulusan Tepat Waktu Menggunakan Metode C4.5 dan K-NN (Studi Kasus : Mahasiswa Program Studi S1 Ilmu Farmasi, Fakultas Farmasi, Universitas Muhammadiyah Purwokerto). Techno (Jurnal Fakultas Teknik, Universitas Muhammadiyah Purwokerto), 20(2), 131. https://doi.org/10.30595/techno.v20i2.5160

Risnawati. (2018). Analisis Kelulusan Mahasiswa Menggunakan Algoritma C.45. Jurnal Mantik Penusa, 2(1), 71–76.

Rohmawan, E. P. (2018). Prediksi Kelulusan Mahasiswa Tepat Waktu Menggunakan Metode Decision Tree dan Artificial Neural Network. Jurnal Ilmiah MATRIK, 20(1), 21–30.

Sutton, R. S. (1992). Introduction: The challenge of reinforcement learning. Machine Learning, 8(3–4), 225–227. https://doi.org/10.1007/BF00992695

Thaniket, R., Kusrini, K., & Luthf, E. T. (2020). Prediksi Kelulusan Mahasiswa Tepat Waktu Menggunakan Algoritma Support Vector Machine. Jurnal FATEKSA : Jurnal Teknologi Dan Rekayasa, 5(2), 20–29.

Wager, T. D., Keller, M. C., Lacey, S. C., & Jonides, J. (2005). Increased sensitivity in neuroimaging analyses using robust regression. NeuroImage, 26(1), 99–113. https://doi.org/10.1016/j.neuroimage.2005.01.011

Wainer, J., & Cawley, G. (2017). Empirical evaluation of resampling procedures for optimising SVM hyperparameters. Journal of Machine Learning Research, 18(13), 1–35.

Wang, S. (2019). A sharper generalization bound for divide-and-conquer ridge regression. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, 5305–5312. https://doi.org/10.1609/aaai.v33i01.33015305

Zhou, J. (2021). Research on Time Series Anomaly Detection: Based on Deep Learning Methods. Journal of Physics: Conference Series, 2132(1). https://doi.org/10.1088/1742-6596/2132/1/012012

Published

2022-07-16

Issue

Section

Articles