2025-11-16T00:07:11.969314

Automatic Piecewise Linear Regression for Predicting Student Learning Satisfaction

Choi, Nadarajan
Although student learning satisfaction has been widely studied, modern techniques such as interpretable machine learning and neural networks have not been sufficiently explored. This study demonstrates that a recent model that combines boosting with interpretability, automatic piecewise linear regression(APLR), offers the best fit for predicting learning satisfaction among several state-of-the-art approaches. Through the analysis of APLR's numerical and visual interpretations, students' time management and concentration abilities, perceived helpfulness to classmates, and participation in offline courses have the most significant positive impact on learning satisfaction. Surprisingly, involvement in creative activities did not positively affect learning satisfaction. Moreover, the contributing factors can be interpreted on an individual level, allowing educators to customize instructions according to student profiles.
academic

Automatic Piecewise Linear Regression for Predicting Student Learning Satisfaction

Basic Information

  • Paper ID: 2510.10639
  • Title: Automatic Piecewise Linear Regression for Predicting Student Learning Satisfaction
  • Authors: Haemin Choi, Gayathri Nadarajan (Department of Data Science, Sungkyunkwan University)
  • Classification: cs.AI cs.LG
  • Publication Date: October 12, 2025
  • Paper Link: https://arxiv.org/abs/2510.10639

Abstract

This study explores the application of Automatic Piecewise Linear Regression (APLR) in predicting student learning satisfaction. While student learning satisfaction has been extensively studied, modern interpretable machine learning and neural network techniques remain underexplored in this domain. The research demonstrates that the APLR model, combining boosting algorithms with interpretability, outperforms numerous state-of-the-art methods. Through numerical and visual interpretation analysis via APLR, the study identifies that students' time management ability, concentration, perceived helpfulness to classmates, and offline course participation have the most significant positive impacts on learning satisfaction. Surprisingly, participation in creative activities did not yield positive effects on learning satisfaction.

Research Background and Motivation

Problem Definition

This study addresses the prediction of student learning satisfaction and identification of influencing factors during the COVID-19 pandemic. After experiencing two years of online learning, students' preferences for different learning modalities have shifted, necessitating a deeper understanding of key factors affecting learning satisfaction.

Research Significance

  1. Educational Practice Guidance: Assists educators and institutions in customizing better teaching methods to enhance overall learning experience
  2. Personalized Learning: Provides scientific evidence for personalized instruction
  3. Pandemic Impact Analysis: Offers deep insights into how special teaching environments during the pandemic affect learning satisfaction

Limitations of Existing Methods

  1. Traditional Statistical Approaches: Primarily employ Structural Equation Modeling (SEM) and statistical hypothesis testing, lacking predictive capability
  2. Incomplete Feature Consideration: Existing research rarely considers emotional states and learning environment factors
  3. Insufficient Interpretability: Lacks application of modern interpretable machine learning techniques

Core Contributions

  1. Superior Method Performance: APLR achieves best performance on 4 out of 5 evaluation metrics compared to representative bagging and boosting tree models, interpretable additive models, and Transformer-based deep learning models
  2. Comprehensive Interpretability Analysis: Provides both global and local explanations, offering valuable insights into learning satisfaction factors for both overall populations and individual students
  3. Support for Personalized Learning: Paves the way for personalized learning, enabling educators to customize instruction based on student profiles
  4. Open-Source Dataset and Code: Provides complete implementation code and dataset for research community use

Methodology Details

Task Definition

Input: 47 features including demographic information, learning methods, perceived performance, self-efficacy, motivation, engagement, emotional states, stress coping mechanisms, and learning environment Output: Binary classification task predicting student learning satisfaction (satisfied/dissatisfied) Constraints: Target variable constructed from 7 core features; total score ≥4 indicates satisfaction, otherwise dissatisfaction

Model Architecture

APLR Core Mechanism

APLR combines the advantages of gradient boosting and Multivariate Adaptive Regression Splines (MARS):

  1. Component-wise Gradient Boosting: Each simple base learner fits a single predictor, selecting the learner that best minimizes the loss function
  2. Boosting Steps (m = 1 to M):
    Negative Gradient Calculation: u_m = y - f̂_{m-1}(C_{m-1})
    Intercept Update: Using weighted mean of u_m multiplied by learning rate v
    Base Function Selection: Find optimal APLR base function h_m(u_m, e_j) for each candidate e_j
    Term Selection: Select term with minimum loss as candidate
    Coefficient Update: Update regression coefficients β
    
  3. Regression Coefficient Estimation: β=vi=1nefff(xi)wium,ii=1nefff(xi)2wiβ = v \cdot \frac{\sum_{i=1}^{n_{eff}} f(x_i) \cdot w_i \cdot u_{m,i}}{\sum_{i=1}^{n_{eff}} f(x_i)^2 \cdot w_i}

Technical Innovations

  1. Piecewise Linear Processing: Unlike EBM's additive smooth functions, APLR segments data and fits linear models to each segment
  2. Interaction Term Consideration: Automatically identifies and models feature interactions
  3. Computational Efficiency: More efficient than EBM and more user-friendly than random forests and boosting trees
  4. Dual Interpretability: Provides both global feature importance and local contribution explanations

Experimental Setup

Dataset

  • Scale: 302 students from Sungkyunkwan University
  • Time Period: Late 2021 to late 2022 (after 4 semesters of online learning)
  • Composition: 88% full-time students, 12% exchange students
  • Disciplinary Distribution: STEM (41.4%), Humanities and Social Sciences (40.6%), Mixed categories (18%)
  • Course Modality: 76.82% online courses, 23.18% offline courses

Data Preprocessing

  • Encoding Method: 5-point Likert scale converted to numerical values (-2 to 2)
  • Target Variable Construction: Weighted sum of 7 core features
  • Data Splitting: 241 training samples, 61 test samples (8:2 ratio)
  • Imbalance Handling: SMOTE technique applied for class imbalance

Evaluation Metrics

  • Accuracy
  • F1 Score
  • Precision
  • Recall
  • AUC (Area Under ROC Curve)

Comparison Methods

  1. Random Forest: Representative bagging algorithm
  2. LightGBM: Efficient gradient boosting algorithm
  3. Explainable Boosting Machine (EBM): Interpretable machine learning benchmark
  4. TabNet: Transformer-based deep learning model

Hyperparameter Tuning

  • Random Forest: Grid search + 5-fold cross-validation
  • LightGBM: Bayesian optimization (Optuna package)
  • APLR: Built-in APLRTuner with 5-fold cross-validation grid search
  • EBM and TabNet: Default recommended parameters

Experimental Results

Main Results

ModelAccuracyF1 ScorePrecisionRecallAUC
APLR0.8850.9090.9210.8970.926
Random Forest0.8200.8530.8890.8200.947
LightGBM0.8030.8460.8460.8460.889
EBM0.8200.8530.8890.8210.918
TabNet0.8360.8720.8720.8720.818

Key Findings:

  • APLR achieves best performance on 4 out of 5 metrics
  • Slightly lower than Random Forest only on AUC (0.926 vs 0.947)
  • Significantly outperforms other interpretable models (EBM)

Model Interpretation Analysis

Global Feature Importance (Top 5)

  1. Time Management Ability (m_timeManage): 0.534
  2. Concentration Ability (m_concentrate): 0.516
  3. Perceived Helpfulness to Classmates (m_helpful): 0.365
  4. Course Boredom and Time Management Interaction: 0.297
  5. Offline Course Participation (mode_Offline): 0.297

Key Findings

  • Positive Factors: Time management, concentration, sense of helping others, offline learning participation
  • Negative Factors: Creative activity participation (coefficient -0.15)
  • Interaction Effects: Significant interactions exist among multiple features

Case Analysis

Satisfied Student Case

  • Maximum Contributing Factor: Sense of helpfulness (0.681), absence of boredom (0.553)
  • Supporting Factors: Time management (0.447), concentration (0.444)
  • Negative Factors: Creative activity participation (-0.390)

Dissatisfied Student Case

  • Main Issues: Poor time management (1.255), inability to help others (0.681)
  • Mitigating Factors: Adequate concentration (-0.444, negative contribution indicates mitigation of dissatisfaction)

Learning Satisfaction Research

  1. Self-Efficacy Research: Multiple studies find positive correlation between self-efficacy and online learning satisfaction
  2. Student Engagement: Engagement positively impacts online learning satisfaction
  3. Interaction Relationships: Learner-to-learner and teacher-student interactions positively affect satisfaction

Technical Method Evolution

  1. Traditional Methods: Primarily employ Structural Equation Modeling (SEM)
  2. Statistical Testing: Hypothesis testing as main analytical component
  3. Modern AI: Limited application of interpretable machine learning and deep learning techniques

Conclusions and Discussion

Main Conclusions

  1. Method Effectiveness: APLR demonstrates excellent performance on student learning satisfaction prediction tasks
  2. Key Influencing Factors: Time management, concentration, sense of helping others, and offline participation are core positive factors
  3. Unexpected Finding: Creative activity participation does not positively impact learning satisfaction
  4. Personalization Potential: Local explanations support development of personalized teaching strategies

Limitations

  1. Data Scale: Only 302 samples, potentially affecting result generalizability
  2. Geographic Restriction: Limited to students from a single Korean university
  3. Temporal Specificity: Specifically targeted pandemic period; applicability in post-pandemic era remains to be verified
  4. Classification Task Validation: Relatively limited rigorous testing of APLR on classification tasks

Future Directions

  1. Post-Pandemic Comparative Research: Compare changes in key factors before and after pandemic
  2. Multi-Dimensional Extension: Investigate other dimensions such as learning motivation and academic performance
  3. Cross-Geographic Validation: Verify model effectiveness across different cultural backgrounds
  4. Real-Time Application: Develop real-time learning satisfaction monitoring systems

In-Depth Evaluation

Strengths

  1. Method Innovation: First application of APLR to educational data mining, demonstrating value of interpretable AI
  2. Rigorous Experimental Design: Comprehensive hyperparameter tuning and multi-model comparison
  3. Rich Interpretability: Provides dual global and local explanations with practical application value
  4. Valuable Unexpected Finding: Negative correlation between creative activities and learning satisfaction warrants deeper investigation

Weaknesses

  1. Sample Representativeness: Single university sample may harbor selection bias
  2. Causality: Cross-sectional study cannot establish causal relationships
  3. Feature Engineering: Rationality of target variable construction method requires further validation
  4. Insufficient Deep Analysis: Lacks thorough exploration of unexpected findings (e.g., negative impact of creative activities)

Impact

  1. Academic Contribution: Introduces new interpretable AI methods to educational data mining field
  2. Practical Value: Provides scientific evidence for educators' personalized instruction
  3. Reproducibility: Open-source code and dataset facilitate research reproduction and extension
  4. Cross-Domain Potential: APLR method may apply to other small-scale structured data scenarios

Applicable Scenarios

  1. Small-Scale Educational Data: Particularly suitable for education research with limited samples
  2. Prediction Tasks Requiring Interpretability: Educational decision-making requires interpretable AI support
  3. Personalized Education: Supports customized teaching strategies based on student characteristics
  4. Policy Formulation: Provides data-driven decision support for educational policy

References

The paper cites 35 relevant references covering multiple fields including learning satisfaction research, interpretable machine learning, and educational technology, providing solid theoretical foundation for the research.


Overall Assessment: This is a high-quality research paper applying interpretable AI in the educational data mining field, featuring methodological innovation, rigorous experimentation, and valuable results, though with certain limitations in sample scale and generalizability. The research provides valuable technological tools and empirical insights for personalized education.