2025-11-13T20:01:11.522868

Self-Exploring Language Models for Explainable Link Forecasting on Temporal Graphs via Reinforcement Learning

Ding, Huang, Cao et al.

Forecasting future links is a central task in temporal graph (TG) reasoning, requiring models to leverage historical interactions to predict upcoming ones. Traditional neural approaches, such as temporal graph neural networks, achieve strong performance but lack explainability and cannot be applied to unseen graphs without retraining. Recent studies have begun to explore using large language models (LLMs) for graph reasoning, but most of them are constrained to static graphs or small synthetic TGs and lack the evaluation of the quality of reasoning traces generated by LLMs. In this work, we present Reasoning-Enhanced Learning for Temporal Graphs (ReaL-TG), a reinforcement learning framework that fine-tunes LLMs to perform explainable link forecasting on real-world TGs. ReaL-TG uses outcome-based reward to encourage models to self-explore reasoning strategies from graph structure and to produce explanations that directly justify their predictions. To enable evaluation on LLM-generated reasoning traces, we propose a new evaluation protocol combining ranking metrics with an LLM-as-a-Judge system that assesses both the quality of reasoning and the impact of hallucinations. Experiments with ReaL-TG-4B, obtained by fine-tuning Qwen3-4B under our framework, show that it outperforms much larger frontier LLMs, including GPT-5 mini, on ranking metrics, while producing high-quality explanations confirmed by both the LLM judge and human evaluation.

academic

Self-Exploring Language Models for Explainable Link Forecasting on Temporal Graphs via Reinforcement Learning

基本信息

论文ID: 2509.00975
标题: Self-Exploring Language Models for Explainable Link Forecasting on Temporal Graphs via Reinforcement Learning
作者: Zifeng Ding, Shenyang Huang, Zeyu Cao, Emma Kondrup, Zachary Yang, Xingyue Huang, Yuan Sui, Zhangdie Yuan, Yuqicheng Zhu, Xianglong Hu, Yuan He, Farimah Poursafaei, Michael Bronstein, Andreas Vlachos
分类: cs.AI cs.CL cs.LG
发表时间: 2025年10月13日 (arXiv预印本)
论文链接: https://arxiv.org/abs/2509.00975v2

摘要

时序图(TG)中的链接预测是一项核心任务，需要模型利用历史交互来预测未来的连接。传统的神经网络方法虽然性能强劲但缺乏可解释性，且无法在未见过的图上应用而不重新训练。本文提出了ReaL-TG(Reasoning-Enhanced Learning for Temporal Graphs)，这是一个强化学习框架，通过微调大语言模型来执行可解释的时序图链接预测。ReaL-TG使用基于结果的奖励机制鼓励模型从图结构中自主探索推理策略，并生成直接支撑其预测的解释。实验表明，ReaL-TG-4B在排名指标上超越了包括GPT-5 mini在内的更大型前沿LLMs，同时产生了高质量的解释。

研究背景与动机

问题定义

时序图链接预测旨在基于历史节点交互预测未来的连接关系。这在推荐系统、社区发现和金融分析等实际应用中具有重要价值。

现有方法的局限性

传统神经方法：如时序图神经网络(TGNNs)、记忆网络等虽然效果好，但存在两个关键问题：
- 缺乏人类可读的解释，难以评估结果的可信度
- 应用到新图时需要重新训练，无法无缝泛化
现有LLM方法：
- 大多局限于静态图或小规模合成时序图
- 存在数据泄露风险(文本属性可能在预训练时见过)
- 缺乏对LLM生成推理轨迹质量的评估

研究动机

本文旨在开发一个既能提供高质量预测又能生成可解释推理的时序图链接预测方法，同时避免数据泄露问题并能泛化到未见过的图。

核心贡献

提出ReaL-TG框架：首个通过强化学习使LLM能够在真实世界时序图上执行可解释且有效链接预测的框架
新的评估协议：结合排名指标和LLM-as-a-Judge系统，不仅评估预测准确性，还评估推理质量和幻觉影响
优异的实验结果：ReaL-TG-4B在已见和未见图上都超越了更大的前沿LLMs，并产生了经LLM评判和人工评估确认的高质量解释

方法详解

任务定义

时序图定义：时序图G表示为按时间顺序排列的交互序列：G = {(ui, vi, ti)}，其中ui, vi是源节点和目标节点，ti是时间戳。

QA式链接预测：给定查询q = (uq, ?, tq)和历史Htq，LLM需要生成文本答案A指定预测的目标节点集合vq。

模型架构

1. 时序上下文图选择(T-CGS)

使用α-时序随机游走构建与查询最相关的子图Gc
从查询节点(uq, tq)开始，以概率α终止，以概率1-α继续到历史邻居
转移概率考虑时间衰减：P(e,t)(e', t') = β^|{...}|/∑βz，优先选择时间上更近的邻居

2. 提示构建

将选定的上下文图Gc和查询q组合成提示Q，要求LLM在标签内生成推理，在标签内给出预测。

3. 强化学习训练

奖励函数：基于F1分数的结果奖励r(O) = F1({a}, {vq})，平衡精确率和召回率
优化目标：使用GRPO(Grouped Regularized Policy Optimization)最大化目标函数：

JGRPO(θ) = E[1/g ∑(min(πθ(Oi,j|Q,Oi,<j)/πθold(Oi,j|Q,Oi,<j) * Advi,j, 
                    clip(πθ(Oi,j|Q,Oi,<j)/πθold(Oi,j|Q,Oi,<j), 1-ε, 1+ε) * Advi,j) 
                 - γDKL(πθ||πref))]