Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic Forecast
Qi, Do, Liu et al.
Unlike conventional "black-box" transformers with classical self-attention mechanism, we build a lightweight and interpretable transformer-like neural net by unrolling a mixed-graph-based optimization algorithm to forecast traffic with spatial and temporal dimensions. We construct two graphs: an undirected graph $\mathcal{G}^u$ capturing spatial correlations across geography, and a directed graph $\mathcal{G}^d$ capturing sequential relationships over time. We predict future samples of signal $\mathbf{x}$, assuming it is "smooth" with respect to both $\mathcal{G}^u$ and $\mathcal{G}^d$, where we design new $\ell_2$ and $\ell_1$-norm variational terms to quantify and promote signal smoothness (low-frequency reconstruction) on a directed graph. We design an iterative algorithm based on alternating direction method of multipliers (ADMM), and unroll it into a feed-forward network for data-driven parameter learning. We insert graph learning modules for $\mathcal{G}^u$ and $\mathcal{G}^d$ that play the role of self-attention. Experiments show that our unrolled networks achieve competitive traffic forecast performance as state-of-the-art prediction schemes, while reducing parameter counts drastically. Our code is available in https://github.com/SingularityUndefined/Unrolling-GSP-STForecast .
academic
Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic Forecast
This paper proposes a lightweight and interpretable Transformer model based on mixed graph algorithm unrolling for traffic forecasting. Unlike traditional "black-box" Transformers, this approach constructs an interpretable Transformer-like neural network by unrolling a mixed graph optimization algorithm. The model constructs two graphs: an undirected graph Gu capturing geospatial correlations and a directed graph Gd capturing temporal relationships. Novel ℓ2 and ℓ1 norm variational terms are designed to quantify and promote signal smoothness on the directed graph. Based on the Alternating Direction Method of Multipliers (ADMM), an iterative algorithm is designed and unrolled into a feedforward network for data-driven parameter learning. Experiments demonstrate that the model maintains competitive traffic forecasting performance while significantly reducing the number of parameters.
First Mixed Graph Algorithm Unrolling: Combines undirected graphs (spatial) and directed graphs (temporal) to model complex spatio-temporal relationships
Innovative Directed Graph Regularization Terms: Designs directed graph Laplacian regularizer (DGLR) and directed graph total variation (DGTV)
Lightweight Interpretable Transformer: Achieves significant parameter reduction (only 6.4% of PDFormer) through ADMM algorithm unrolling
Theoretical Contribution: Proves that directed graph frequency definition degenerates to classical Fourier frequency in the unweighted directed line graph case
Given observations from N monitoring stations over T+1 past time steps, predict traffic states for S future time steps. Input is partially observed spatio-temporal signal y∈RM, output is complete spatio-temporal signal x∈RN(T+S+1).
Theorem 3.1 proves that for unweighted directed line graphs, the symmetrized directed graph Laplacian Lrd=(Lrd)TLrd is equivalent to the undirected line graph Laplacian, validating the reasonableness of the frequency definition.
Graph signal processing foundations (Ortega et al., 2018)
Traffic forecasting related work (Li et al., 2017; Yu et al., 2018)
Overall Assessment: This is an innovative work in the traffic forecasting domain that successfully extends algorithm unrolling ideas to mixed graph settings, achieving significant parameter reduction while maintaining performance. Although there remains room for improvement on certain metrics, its lightweight and interpretable characteristics make it of considerable practical value and academic significance.