Discursive Circuits: How Do Language Models Understand Discourse Relations?
Miao, Kan
Which components in transformer language models are responsible for discourse understanding? We hypothesize that sparse computational graphs, termed as discursive circuits, control how models process discourse relations. Unlike simpler tasks, discourse relations involve longer spans and complex reasoning. To make circuit discovery feasible, we introduce a task called Completion under Discourse Relation (CuDR), where a model completes a discourse given a specified relation. To support this task, we construct a corpus of minimal contrastive pairs tailored for activation patching in circuit discovery. Experiments show that sparse circuits ($\approx 0.2\%$ of a full GPT-2 model) recover discourse understanding in the English PDTB-based CuDR task. These circuits generalize well to unseen discourse frameworks such as RST and SDRT. Further analysis shows lower layers capture linguistic features such as lexical semantics and coreference, while upper layers encode discourse-level abstractions. Feature utility is consistent across frameworks (e.g., coreference supports Expansion-like relations).
academic
Discursive Circuits: How Do Language Models Understand Discourse Relations?
This paper investigates which components in transformer language models are responsible for discourse understanding. The authors hypothesize that sparse computational graphs (termed discursive circuits) control how models process discourse relations. Unlike simple tasks, discourse relations involve longer text spans and complex reasoning. To make circuit discovery feasible, the authors introduce the "Completion Under Discourse Relations" (CUDR) task, which requires models to complete discourse under specified relations. Experiments demonstrate that sparse circuits (approximately 0.2% of GPT-2 model connections) can recover discourse understanding capabilities in PDTB-based CUDR tasks and generalize well to unseen discourse frameworks such as RST and SDRT.
Discourse structure is crucial for ensuring language model safety and ethical behavior, yet little is known about how language models internally process discourse, limiting our ability to guarantee model reliability and harmless outputs.
Bridge the linguistic structure of discourse and the requirements of circuit discovery to open new pathways for understanding mechanisms in complex language tasks.
Please select one of the following two options to complete the discourse:
Option 1: "he goes to the canteen"
Option 2: "the canteen is closed"
To complete: [Bob is hungry]_{Arg1} [so]_{Conn} → [he goes to the canteen]_{Arg2}
By changing the discourse connective (from "so" to "but"), the model's prediction should change accordingly.
Error case analysis reveals limitations of PDTB circuits in handling interjections ("yay!!") and subject ellipsis, while SDRT circuits handle these phenomena better.
Classical discourse theory literature: Mann & Thompson (1987), Asher & Lascarides (2003)
Circuit discovery methods: Wang et al. (2023), Conmy et al. (2023)
Discourse datasets: Webber et al. (2019), Liu et al. (2024b)
Mechanistic interpretability: Zhang & Nanda (2024), Miller et al. (2024)
Overall Assessment: This is a high-quality research paper that excels in methodological innovation, experimental design, and analytical depth. Through clever CUDR task design, it successfully applies circuit discovery techniques to complex discourse understanding tasks, providing new perspectives for understanding language models' internal mechanisms. Despite certain limitations, its pioneering work and rich findings demonstrate significant academic value and practical potential.