2025-11-23T12:04:17.035274

Discursive Circuits: How Do Language Models Understand Discourse Relations?

Miao, Kan

Which components in transformer language models are responsible for discourse understanding? We hypothesize that sparse computational graphs, termed as discursive circuits, control how models process discourse relations. Unlike simpler tasks, discourse relations involve longer spans and complex reasoning. To make circuit discovery feasible, we introduce a task called Completion under Discourse Relation (CuDR), where a model completes a discourse given a specified relation. To support this task, we construct a corpus of minimal contrastive pairs tailored for activation patching in circuit discovery. Experiments show that sparse circuits ($\approx 0.2\%$ of a full GPT-2 model) recover discourse understanding in the English PDTB-based CuDR task. These circuits generalize well to unseen discourse frameworks such as RST and SDRT. Further analysis shows lower layers capture linguistic features such as lexical semantics and coreference, while upper layers encode discourse-level abstractions. Feature utility is consistent across frameworks (e.g., coreference supports Expansion-like relations).

academic

Discursive Circuits: How Do Language Models Understand Discourse Relations?

基本信息

论文ID: 2510.11210
标题: Discursive Circuits: How Do Language Models Understand Discourse Relations?
作者: Yisong Miao, Min-Yen Kan (National University of Singapore)
分类: cs.CL (Computational Linguistics), cs.LG (Machine Learning)
发表时间: 2025年10月13日 (arXiv预印本)
论文链接: https://arxiv.org/abs/2510.11210

摘要

本文探讨了transformer语言模型中哪些组件负责话语理解。作者假设稀疏计算图（称为话语回路）控制着模型处理话语关系的方式。与简单任务不同，话语关系涉及更长的文本跨度和复杂推理。为使回路发现变得可行，作者引入了"话语关系下的完成"(CUDR)任务，让模型在指定关系下完成话语。实验表明，稀疏回路（约占GPT-2模型的0.2%）能在基于PDTB的CUDR任务中恢复话语理解能力，且能很好地泛化到RST和SDRT等未见过的话语框架。

研究背景与动机

问题定义

话语结构对于确保语言模型安全和道德行为至关重要，但人们对语言模型内部如何处理话语知之甚少，这限制了我们保证模型可靠性和无害输出的能力。

研究重要性

安全性需求: 话语理解对模型的安全和道德行为至关重要
可解释性缺失: 现有方法缺乏对话语处理机制的深入理解
复杂性挑战: 话语关系比简单任务涉及更长上下文和复杂推理

现有方法局限性

注意力可视化和理由生成等方法缺乏机制性解释
现有回路发现方法主要关注简单任务（如数值比较），难以直接适应话语关系
缺乏跨框架的统一理解：不同话语框架间缺乏机制层面的比较

研究动机

通过桥接话语的语言学结构和回路发现的要求，开辟理解复杂语言任务机制的新路径。

核心贡献

提出CUDR任务：设计了适合回路发现的话语关系完成任务
构建多框架数据集：涵盖PDTB、RST、SDRT等主要话语框架，共27,754个实例
发现话语回路：识别出仅占模型0.2%连接但能达到90%忠实度的稀疏回路
跨框架泛化：证明从PDTB学到的回路能很好泛化到其他话语框架
构建回路层次结构：首次基于神经回路组件构建话语层次结构
语言特征分析：揭示不同层次捕获的语言特征及其跨框架一致性

方法详解

任务定义：CUDR (Completion under Discourse Relations)

CUDR任务创建了一个受控环境来测试模型的话语行为：

输入格式：

原始话语： $d_{ori} = (Arg1, Arg2, R, Conn)$
反事实话语： $d_{cf} = (Arg1, Arg'_2, R', Conn')$

任务设置：

请选择以下两个选项之一来完成话语：
选项1: "he goes to the canteen" 
选项2: "the canteen is closed"

待完成: [Bob is hungry]_{Arg1} [so]_{Conn} → [he goes to the canteen]_{Arg2}

通过改变话语连接词（从"so"到"but"），模型的预测应相应改变。