Comparing Five Machine Learning-Based Regression Models for Predicting the Study Period of Mathematics Students at IPB University
DOI:
https://doi.org/10.31764/jtam.v6i3.8408Keywords:
Cross-Validation, Hyperparameter, Machine Learning, Multiple Linear Regression, Tuning ConstantAbstract
Grade point average (GPA) is initial information for supervisors to characterize their supervised students. One model that can be used to predict a student's study period based on GPA is a machine learning-based regression model so that supervisors can apply the right strategy for their students. Therefore, this study aims to implement and select a machine learning-based regression model to predict a student's study period based on GPA in semesters 1-6. Several regression models used are least-square regression, ridge regression, Huber regression, quantile regression, and quantile regression with l_2-regularization provided by Machine Learning in Julia (MLJ). The model is evaluated and selected based on several criteria such as maximum error, RMSE, and MAPE. The results showed that the least-square regression model gave the worst evaluation results, although the calculation method was easy and fast. Meanwhile, the quantile regression model provided the best evaluation results. The quantile regression model without regularization gives the smallest RMSE (2.31 months) and MAPE (3.56%), while the quantile regression model with l_2-regularization has a better maximum error (4.9 months). The resulting model can be used by supervisors to predict the study period of their supervised students so that supervisors can characterize their students and can design appropriate strategies. Thus, the student's study period is expected to be accelerated with a high-quality final project.References
Aggarwal, A., & Toshniwal, D. (2021). A hybrid deep learning framework for urban air quality forecasting. Journal of Cleaner Production, 329. https://doi.org/10.1016/j.jclepro.2021.129660
Alzubi, J., Nayyar, A., & Kumar, A. (2018). Machine Learning from Theory to Algorithms: An Overview. Journal of Physics: Conference Series, 1142(1). https://doi.org/10.1088/1742-6596/1142/1/012012
An, S., Liu, W., & Venkatesh, S. (2007). Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recognition, 40(8), 2154–2162. https://doi.org/10.1016/j.patcog.2006.12.015
Anugerah, A. S. P., Indriati, & Dewi, C. (2017). Implementasi Algoritme Fuzzy K-Nearest Neighbor untuk Penentuan Lulus Tepat Waktu (Studi Kasus: Fakultas Ilmu Komputer Universitas Brawijaya). Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 2(4), 1726–1732.
Ardhana, N. K. K., Nurdiati, S., Najib, M. K., & Mukrim, S. A. (2022). Akurasi dan Efisiensi Solusi Persamaan Diferensial Biasa Dengan Masalah Nilai Batas Pada Julia dan Octave. Jurnal Matematika UNAND, 11(1), 32–46. https://doi.org/10.25077/jmu.11.1.32-46.2022
Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM Review, 59(1), 65–98. https://doi.org/10.1137/141000671
Blaom, A., Kiraly, F., Lienart, T., Simillides, Y., Arenas, D., & Vollmer, S. (2020). MLJ: A Julia package for composable machine learning. Journal of Open Source Software, 5(55), 2704. https://doi.org/10.21105/joss.02704
Cahyani, T. B. N., Wigena, A. H., & Djuraidah, A. (2016). Quantile regression with elastic-net in statistical downscaling to predict extreme rainfall. Global Journal of Pure and Applied Mathematics, 12(4), 3517–3524.
Campaigne, H. (1959). Some experiments in machine learning. Proceedings of the Western Joint Computer Conference, IRE-AIEE-ACM 1959, 173–175. https://doi.org/10.1145/1457838.1457868
Davino, C., Romano, R., & Vistocco, D. (2022). Handling multicollinearity in quantile regression through the use of principal component regression. Metron. https://doi.org/10.1007/s40300-022-00230-3
Dey, A. (2016). Machine Learning Algorithms: A Review. International Journal of Computer Science and Information Technologies, 7(3), 1174–1179.
Gao, K., Mei, G., Piccialli, F., Cuomo, S., Tu, J., & Huo, Z. (2020). Julia language in machine learning: Algorithms, applications, and open issues. Computer Science Review, 37. https://doi.org/10.1016/j.cosrev.2020.100254
Heath, M. T. (2002). Scientific Computing: An Introduction to Survey (2nd ed.). McGraw-Hill.
Huber, P. J. (1964). Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1), 73–101. https://doi.org/10.1214/aoms/1177703732
Jalal, D., & Ezzedine, T. (2019). Performance analysis of machine learning algorithms for water quality monitoring system. 2019 International Conference on Internet of Things, Embedded Systems and Communications, IINTEC 2019 - Proceedings, 86–89. https://doi.org/10.1109/IINTEC48298.2019.9112096
Johnson, M. L., & Faunt, L. M. (1992). Parameter estimation by least-squares methods. Methods in Enzymology, 210, 1–37. https://doi.org/10.1016/0076-6879(92)10003-V
Jones, T. A. (1972). Multiple regression with correlated independent variables. Journal of the International Association for Mathematical Geology, 4(3), 203–218. https://doi.org/10.1007/bf02311718
Joshi, A., & Lakhanpal, R. (2017). Learning Julia: Build high-performance applications for scientic computing. Packt Publishing Ltd.
Koenker, R., & Bassett, G. (1978). Regression Quantiles. Econometrica, 46(1), 33. https://doi.org/10.2307/1913643
Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Informatica (Ljubljana), 31(3), 249–268.
Lejeune, M. G., & Sarda, P. (1988). Quantile regression: a nonparametric approach. Computational Statistics and Data Analysis, 6(3), 229–239. https://doi.org/10.1016/0167-9473(88)90003-5
Li, Z. (2021). Research on signal modulation based on machine learning intelligent algorithm and computer automatic identification. Journal of Physics: Conference Series, 2083(4). https://doi.org/10.1088/1742-6596/2083/4/042092
Liu, T., Wang, Z., Zeng, J., & Wang, J. (2021). Machine-learning-based models to predict shear transfer strength of concrete joints. Engineering Structures, 249. https://doi.org/10.1016/j.engstruct.2021.113253
Lyu, Z., Yu, Y., Samali, B., Rashidi, M., Mohammadi, M., Nguyen, T. N., & Nguyen, A. (2022). Back-Propagation Neural Network Optimized by K-Fold Cross-Validation for Prediction of Torsional Strength of Reinforced Concrete Beam. Materials, 15(4). https://doi.org/10.3390/ma15041477
Marquardt, D. W. (1970). Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation. Technometrics, 12(3), 591. https://doi.org/10.2307/1267205
Martens, H. H. (1959). Two notes on machine “Learning.†Information and Control, 2(4), 364–379. https://doi.org/10.1016/S0019-9958(59)80014-0
Masykuri, W. S., Khatizah, E., & Bukhari, F. (2021). The application of perceptron method in predicting student graduation based on several identified key factors. IOP Conference Series: Earth and Environmental Science, 1796(1). https://doi.org/10.1088/1742-6596/1796/1/012060
Purwanto, E., Kusrini, K., & Sudarmawan, S. (2019). Prediksi Kelulusan Tepat Waktu Menggunakan Metode C4.5 dan K-NN (Studi Kasus : Mahasiswa Program Studi S1 Ilmu Farmasi, Fakultas Farmasi, Universitas Muhammadiyah Purwokerto). Techno (Jurnal Fakultas Teknik, Universitas Muhammadiyah Purwokerto), 20(2), 131. https://doi.org/10.30595/techno.v20i2.5160
Risnawati. (2018). Analisis Kelulusan Mahasiswa Menggunakan Algoritma C.45. Jurnal Mantik Penusa, 2(1), 71–76.
Rohmawan, E. P. (2018). Prediksi Kelulusan Mahasiswa Tepat Waktu Menggunakan Metode Decision Tree dan Artificial Neural Network. Jurnal Ilmiah MATRIK, 20(1), 21–30.
Sutton, R. S. (1992). Introduction: The challenge of reinforcement learning. Machine Learning, 8(3–4), 225–227. https://doi.org/10.1007/BF00992695
Thaniket, R., Kusrini, K., & Luthf, E. T. (2020). Prediksi Kelulusan Mahasiswa Tepat Waktu Menggunakan Algoritma Support Vector Machine. Jurnal FATEKSA : Jurnal Teknologi Dan Rekayasa, 5(2), 20–29.
Wager, T. D., Keller, M. C., Lacey, S. C., & Jonides, J. (2005). Increased sensitivity in neuroimaging analyses using robust regression. NeuroImage, 26(1), 99–113. https://doi.org/10.1016/j.neuroimage.2005.01.011
Wainer, J., & Cawley, G. (2017). Empirical evaluation of resampling procedures for optimising SVM hyperparameters. Journal of Machine Learning Research, 18(13), 1–35.
Wang, S. (2019). A sharper generalization bound for divide-and-conquer ridge regression. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, 5305–5312. https://doi.org/10.1609/aaai.v33i01.33015305
Zhou, J. (2021). Research on Time Series Anomaly Detection: Based on Deep Learning Methods. Journal of Physics: Conference Series, 2132(1). https://doi.org/10.1088/1742-6596/2132/1/012012
Downloads
Published
Issue
Section
License
Authors who publish articles in JTAM (Jurnal Teori dan Aplikasi Matematika) agree to the following terms:
- Authors retain copyright of the article and grant the journal right of first publication with the work simultaneously licensed under a CC-BY-SA or The Creative Commons Attribution–ShareAlike License.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).