Recent advances in deep forecasting models have achieved remarkable performance, yet most approaches still struggle to provide both accurate predictions and interpretable insights into temporal dynamics. This paper proposes CaReTS, a novel multi-task learning framework that combines classification and regression tasks for multi-step time series forecasting problems. The framework adopts a dual-stream architecture, where a classification branch learns the stepwise trend into the future, while a regression branch estimates the corresponding deviations from the latest observation of the target variable. The dual-stream design provides more interpretable predictions by disentangling macro-level trends from micro-level deviations in the target variable. To enable effective learning in output prediction, deviation estimation, and trend classification, we design a multi-task loss with uncertainty-aware weighting to adaptively balance the contribution of each task. Furthermore, four variants (CaReTS1--4) are instantiated under this framework to incorporate mainstream temporal modelling encoders, including convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and Transformers. Experiments on real-world datasets demonstrate that CaReTS outperforms state-of-the-art (SOTA) algorithms in forecasting accuracy, while achieving higher trend classification performance.
CaReTS: A Multi-Task Framework Unifying Classification and Regression for Time Series Forecasting
- Paper ID: 2511.09789
- Title: CaReTS: A Multi-Task Framework Unifying Classification and Regression for Time Series Forecasting
- Authors: Fulong Yao (Cardiff University), Wanqing Zhao (Newcastle University), Chao Zheng (Newcastle University), Xiaofei Han (University of Leeds)
- Category: cs.LG (Machine Learning)
- Publication Date: November 12, 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2511.09789
Deep learning has achieved significant progress in time series forecasting, yet existing methods often struggle to provide interpretable insights into temporal dynamics while delivering accurate predictions. This paper proposes CaReTS, a multi-task learning framework combining classification and regression tasks for multi-step time series forecasting. The framework employs a dual-stream architecture: the classification branch learns stepwise future trends, while the regression branch estimates deviations relative to the most recent observation. This design provides more interpretable forecasts by decoupling macroscopic trends from microscopic deviations. To enable effective learning, an uncertainty-aware multi-task loss function is designed to adaptively balance task contributions. The paper instantiates four variants (CaReTS1-4) combined with mainstream temporal encoding architectures (CNN, LSTM, Transformer). Experiments demonstrate that CaReTS surpasses state-of-the-art algorithms in both prediction accuracy and trend classification performance.
Time series forecasting is a fundamental problem in energy management, financial analysis, medical monitoring, and climate modeling. Multi-step forecasting is particularly critical but faces two major challenges:
- Accuracy Degradation: Prediction precision typically decreases as the forecasting horizon increases
- Insufficient Interpretability: In high-risk scenarios, model opacity reduces trustworthiness
Multi-step forecasting is crucial for capturing both short-term and long-term temporal dynamics of systems, enabling informed decision-making. However, while existing deep learning models have improved accuracy, they remain significantly deficient in interpretability, limiting their reliability in practical applications.
- Single Regression Paradigm: Most deep forecasting models formulate prediction as a single regression task, focusing solely on numerical prediction
- Coupled Trends and Deviations: Difficulty in decoupling macroscopic trends (e.g., upward/downward trajectories) from microscopic deviations
- Lack of Explicit Trend Modeling: While models like Autoformer and FEDformer introduce decomposition mechanisms, they primarily operate at input or representation layers rather than explicitly separating trends and magnitudes at the output layer
The core insight of this work is that decomposing time series forecasting into two complementary tasks—trend classification (direction) and deviation regression (magnitude)—can simultaneously enhance both prediction accuracy and interpretability. This output-layer decoupling provides a novel multi-task learning perspective.
- Dual-Stream Architecture Design: Proposes the CaReTS framework with a dual-stream architecture where the classification branch predicts stepwise macroscopic trends and the regression branch estimates fine-grained deviations relative to the most recent observation
- Uncertainty-Aware Multi-Task Learning: Designs an uncertainty-based multi-task loss function that jointly optimizes classification and regression tasks through adaptive weighting, eliminating manual hyperparameter tuning
- Framework Generality: Instantiates four variants (CaReTS1-4) compatible with mainstream temporal encoders (CNN, LSTM, Transformer), demonstrating broad applicability
- Performance Enhancement and Interpretability Improvement: Achieves state-of-the-art prediction accuracy on real datasets with trend classification accuracy exceeding 91% and manageable computational overhead
Input: Time series x={x1,x2,…,xn}, where xn is the most recent observation of the target variable
Output: K-step ahead forecasts y^={y^1,y^2,…,y^K}
Core Idea: Decompose each step's prediction into trend direction d(k) and deviation magnitude δ(k)
Architecture (a): Parallel Dual-Stream
- Temporal encoder (CNN/LSTM/Transformer) extracts temporal features
- Features are fed in parallel to two independent fully-connected streams:
- Classification Stream: Predicts stepwise trends (up/down)
- Regression Stream: Estimates deviations relative to xn
- Residual Fusion: y^(k)=xn+Fusion(d(k),δ(k))
Architecture (b): Sequential Dual-Stream
- First infers trends through the classification stream
- Concatenates classification output with original temporal features
- Feeds into regression stream for deviation estimation
- Direct fusion: y^(k)=xn+δ^(k)
| Model | Architecture | Trend Representation | Deviation Representation | Fusion Method |
|---|
| CaReTS1 | (a) | Binary label d^(k)∈{+1,−1} | Single non-negative deviation δ^(k) | y^(k)=xn+d^(k)⋅δ^(k) |
| CaReTS2 | (a) | Binary label d^(k)∈{+1,−1} | Direction-specific deviations (δ^up(k),δ^down(k)) | Select corresponding deviation by trend |
| CaReTS3 | (a) | Probabilities (pup(k),pdown(k)) | Direction-specific deviations (δ^up(k),δ^down(k)) | y^(k)=xn+pup(k)δ^up(k)−pdown(k)δ^down(k) |
| CaReTS4 | (b) | Probability p(k) | Signed deviation δ^(k) | y^(k)=xn+δ^(k) |
L(a)=αcaLca+αdeLde+αopLop
Where:
- Lca: Trend classification loss (binary or categorical cross-entropy)
- Lde: Deviation estimation loss (MSE)
- Lop: Output prediction loss (MSE)
L(b)=αcaLca+αopLop
Core innovation: Model task weights as learnable parameters, adaptively adjusted based on prediction uncertainty:
αi=2σi21,i∈{ca,de,op}
Implementation uses log-variance logσi2 as learnable parameters, with final loss:
L(a)=∑i∈{ca,de,op}(21e−logσi2Li+21logσi2)
Stabilization Strategies:
- Soft regularization: Add penalty terms to log-variance parameters
- Value range constraint: Restrict logσi2 to [−10,10]
- Output-Layer Decoupling: Unlike Autoformer et al. that decompose at input layer, CaReTS explicitly separates trends and deviations at output layer, providing more direct interpretability
- Soft Fusion Mechanism (CaReTS3): Fuses deviations from both directions via probability weighting, enabling smooth transitions when trend uncertainty is high
- Adaptive Task Balancing: Uncertainty-based weight learning eliminates manual hyperparameter tuning, allowing the model to automatically focus on more reliable tasks
- Progressive Complexity Design: From CaReTS1 to CaReTS4, gradually increases modeling capacity, systematically exploring the design space
Two real-world time series forecasting tasks:
- Electricity Price Forecasting: 8,784 hourly observations (one year)
- Unmet Electricity Demand Forecasting: 8,784 hourly observations
Forecasting Configuration: 15-to-6 scheme
- Input: Month, day-of-week, hour of current timestep + past 12 observations of target variable
- Output: Next 6 steps of target variable forecasts
Data Split:
- Training set: 6,048 points
- Test set: 2,736 points
- Evaluation method: 10-fold cross-validation
- RMSE (Root Mean Square Error): Measures prediction accuracy
- Trend Classification Accuracy: Measures correctness of trend direction prediction
Design Baselines (3):
- Baseline1: Traditional encoder-decoder architecture
- Baseline2: Simplified version without residual connections
- Baseline3: Single FC layer replacing fusion module
SOTA Algorithms (10):
- Transformer series: Autoformer, FEDformer, Non-stationary Transformer, Informer
- Hybrid models: TimesNet, TimeXer, D-CNN-LSTM
- Lightweight models: DLinear, NLinear, TimeMixer
- Fuzzy neural network: SOIT2FNN-MO
- Platform: Google Colab with T4 GPU
- Encoder: 2 layers, 64 hidden units
- CNN: Kernel size 3, padding 1
- Transformer: 4 attention heads
- Classification/Regression Branches: 2-layer FC, 64 hidden units
- Optimizer: Adam, learning rate 0.001
- Batch Size: 64
- Training Epochs: Up to 600, early stopping (50 epochs without improvement)
- Activation Function: ReLU
- Normalization: Min-Max normalization
Unmet Electricity Demand Forecasting (Test Set RMSE):
- Best: CaReTS2-Transformer (0.0691 ± 0.0018)
- Second best: CaReTS3-CNN (0.0692 ± 0.0010)
- All CaReTS2-4 variants outperform baselines
Electricity Price Forecasting (Test Set RMSE):
- Best: CaReTS2-Transformer (0.0465 ± 0.0012)
- CaReTS1-4 outperform baselines across all encoder configurations (except CaReTS1-LSTM)
Key Findings:
- CaReTS2 shows most consistent performance, best in 4 of 6 configurations, second best in 2
- Transformer encoder generally outperforms CNN and LSTM
- CaReTS1 shows less advantage due to simplified deviation branch
All variants achieve >90% accuracy:
- Unmet electricity: CaReTS2-Transformer highest (0.9192 ± 0.0022)
- Electricity price: CaReTS2-Transformer highest (0.9146 ± 0.0019)
Cross-Step Analysis (Figure 5):
- Trend classification accuracy remains stable across 6-step forecasting, even slightly improving
- Contrasts with increasing RMSE, demonstrating framework's robustness in maintaining trend consistency for long-term forecasting
Using Transformer encoder as example:
Unmet Electricity:
- CaReTS2 multi-task: RMSE 0.0691, trend accuracy 0.9192
- CaReTS2 single-task: RMSE 0.0704, trend accuracy 0.9060
- Improvement: RMSE reduced by 1.8%, trend accuracy improved by 1.3%
Electricity Price:
- CaReTS1 multi-task: RMSE 0.0473, trend accuracy 0.9142
- CaReTS1 single-task: RMSE 0.0539, trend accuracy 0.8663
- Improvement: RMSE reduced by 12.2%, trend accuracy improved by 5.5%
Computational Overhead:
- Additional parameters: only 3 task weight scalars
- Runtime increase negligible (253-401 seconds vs. 216-386 seconds)
Unmet Electricity:
- CaReTS2: RMSE 0.0691, trend accuracy 0.9192
- TimeXer (second-best SOTA): RMSE 0.0700, trend accuracy 0.9066
- Advantage: RMSE reduced by 1.3%, trend accuracy improved by 1.4%
Electricity Price:
- CaReTS2: RMSE 0.0465, trend accuracy 0.9146
- TimeXer (best SOTA): RMSE 0.0463, trend accuracy 0.9013
- Advantage: RMSE slightly higher by 0.4%, but trend accuracy higher by 1.5%
Efficiency Comparison:
- CaReTS runtime: 200-400 seconds
- Lightweight models (DLinear/NLinear): <70 seconds
- Heavy models (Autoformer/TimeXer): >460 seconds
- Conclusion: CaReTS achieves good balance between accuracy and efficiency
Under 15-4 and 15-8 forecasting configurations:
- CaReTS2 consistently ranks in top three for both RMSE and trend accuracy
- Validates framework stability across different forecasting horizons
- Trend Stability: Trend classification accuracy does not decrease with forecasting steps, demonstrating robustness of macroscopic trend modeling
- Complementary Learning: Multi-task learning promotes complementary learning rather than task interference, joint optimization outperforms single-task
- Encoder Compatibility: Framework works well with different encoders, Transformer generally performs best
- Direction-Specific Modeling: CaReTS2's direction-specific deviation design captures asymmetric dynamics, outperforming single deviation (CaReTS1)
- Soft Fusion Advantage: CaReTS3's probability weighting provides smooth transitions when trend uncertainty is high
- CNN Methods: Extract local spatiotemporal patterns
- RNN Methods: LSTM, GRU capture sequence dependencies
- Transformer Methods:
- Informer: ProbSparse attention
- Autoformer: Seasonal-trend decomposition + autocorrelation attention
- FEDformer: Frequency-domain filtering
- PatchTST: Patch-based embedding
- iTransformer: Inverted modeling focusing on variable dependencies
- Linear Decomposition: DLinear, NLinear achieve competitive results through simple trend-seasonal decomposition
- Transformer Decomposition: ETSformer, Autoformer, FEDformer model components at input/representation layers
- This Work's Distinction: Output-layer decoupling directly separates prediction targets' trends and deviations
- TimeXer: Distinguishes endogenous and exogenous signals
- TimesNet: Multi-period modules capture different temporal scales
- Lightweight MLPs: TimeMixer, LightTS, TSMixer
- This Work's Innovation: Output-layer dual-stream framework with uncertainty-based adaptive task balancing
- CaReTS successfully decouples trend classification and deviation estimation through dual-stream architecture, simultaneously enhancing prediction accuracy and interpretability
- The uncertainty-based multi-task learning mechanism effectively balances three tasks' contributions, eliminating manual hyperparameter tuning
- Four variants demonstrate framework flexibility, with CaReTS2-Transformer combination performing best
- Achieves or exceeds SOTA performance on real datasets with trend classification accuracy exceeding 91% and manageable computational overhead
- Insufficient Long-Term Forecasting Validation: Limited by GPU resources, primarily evaluated on 6-step forecasting without thoroughly validating ultra-long-term prediction capability
- Limited Dataset Diversity: Tested only on two power-related datasets, lacking cross-domain validation (e.g., finance, healthcare)
- Limited Encoder Innovation: Employs standard encoders without exploring customized temporal feature extractors
- Simplified Binary Trends: Models only up/down trends without considering stationary trends or finer-grained trend classification
- Missing Interpretability Quantification: While claiming interpretability enhancement, lacks user studies or quantitative interpretability metrics
- Long-Term Forecasting Extension: Validate ultra-long-term (e.g., 100+ steps) forecasting capability with greater computational resources
- Cross-Domain Validation: Test framework generalization across diverse domains (finance, healthcare, climate)
- Multi-Level Trend Classification: Extend to multi-class trends (e.g., strong up, weak up, stationary)
- Customized Encoders: Explore feature extractors optimized for trend-deviation decomposition
- Interpretability Research: Conduct user studies and quantitatively evaluate interpretability enhancement
- Innovative Problem Decomposition: Decomposing time series forecasting into trend classification and deviation regression is intuitive and effective, providing a novel modeling perspective
- Solid Theoretical Foundation: Uncertainty-aware multi-task learning has solid theoretical support (Kendall et al., 2018) with well-designed implementation details
- Systematic Design Exploration: Four variants progressively evolve from simple to complex, clearly showcasing the design space
- Rigorous and Comprehensive Experiments:
- 10-fold cross-validation provides reliable estimates
- Comparison with 10 SOTA algorithms
- Ablation studies validate component contributions
- Cross-step analysis reveals trend stability
- Strong Reproducibility: Provides anonymous code with detailed implementation details
- Clear Writing: Well-structured with rich figures and accurate technical descriptions
- Insufficient Interpretability Evaluation:
- Lacks visualization cases demonstrating how trend-deviation decomposition aids understanding
- No user studies validating interpretability enhancement
- Interpretability remains largely conceptual
- Dataset Limitations:
- Only two related-domain datasets
- Relatively small sample size (8,784 points)
- Lacks multivariate time series validation
- Missing Long-Term Forecasting Validation:
- Primarily evaluated on 6-step forecasting
- While Figure 5 shows trend stability, longer horizons not actually tested
- Limits judgment on long-term prediction capability
- Coarse Computational Analysis:
- Only reports total runtime
- Lacks detailed time and memory complexity analysis
- No analysis of computational bottlenecks
- Questionable Baseline Design:
- Three design baselines may be insufficient
- Lacks comparison with other multi-task learning approaches
- Simplified Trend Definition:
- Binary trends (up/down) may be overly coarse
- Doesn't consider stationary states or trend strength
- Academic Contribution:
- Provides new perspective on output-layer decomposition
- Application of uncertainty-aware multi-task learning to time series forecasting
- May inspire more trend-magnitude separation research
- Practical Value:
- Demonstrates practicality in applications like electricity forecasting
- Trend classification provides decision support information
- Manageable computational overhead suitable for deployment
- Reproducibility:
- Provides code (though anonymized)
- Complete implementation details
- Facilitates reproduction and extension
- Limitation Impact:
- Dataset and long-term forecasting limitations may restrict impact
- Requires more cross-domain validation for widespread application
Suitable Scenarios:
- Short to Medium-Term Forecasting (6-8 steps): Framework thoroughly validated in this range
- Applications Requiring Trend Explanation: Finance, energy scheduling where trend direction matters more than exact values
- Univariate or Low-Dimensional Time Series: Current experiments are univariate
- Medium Data Volume Scenarios: Training samples ~6,000 points
Less Suitable Scenarios:
- Ultra-Long-Term Forecasting (>10 steps): Unvalidated, effectiveness unknown
- High-Dimensional Multivariate Time Series: Insufficient testing in multivariate settings
- Real-Time Forecasting: 200-400 second computation time may not meet real-time requirements
- Stationary Series Without Clear Trends: Trend classification may lack significant advantage
- Kendall et al. (2018): Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. CVPR. Theoretical foundation for uncertainty weighting
- Vaswani et al. (2017): Attention is all you need. NeurIPS. Transformer architecture
- Zhou et al. (2021): Informer: Beyond efficient transformer for long sequence time-series forecasting. AAAI. ProbSparse attention
- Wu et al. (2021): Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. NeurIPS. Seasonal-trend decomposition
- Zhou et al. (2022): FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. ICML. Frequency-domain decomposition
- Liu et al. (2023): iTransformer: Inverted transformers are effective for time series forecasting. arXiv. Inverted modeling
- Zeng et al. (2023): Are transformers effective for time series forecasting? AAAI. DLinear/NLinear simple baselines
- Wang et al. (2024c): TimeXer: Empowering transformers for time series forecasting with exogenous variables. NeurIPS. Exogenous variable modeling
Overall Assessment: This is a well-designed and rigorously executed time series forecasting paper. The core innovation—output-layer trend-deviation decomposition—is simple yet effective, providing a novel modeling perspective. The uncertainty-aware multi-task learning implementation is elegant. Experimental results demonstrate method effectiveness with improvements in both accuracy and interpretability. Main limitations include insufficient interpretability evaluation, limited dataset diversity, and missing long-term forecasting validation. Recommended future work includes validation across more domains and longer horizons, plus user studies to quantify interpretability gains. Overall, this represents a valuable contribution providing a new modeling paradigm for time series forecasting.