2025-11-23T12:04:17.035274

Discursive Circuits: How Do Language Models Understand Discourse Relations?

Miao, Kan

Which components in transformer language models are responsible for discourse understanding? We hypothesize that sparse computational graphs, termed as discursive circuits, control how models process discourse relations. Unlike simpler tasks, discourse relations involve longer spans and complex reasoning. To make circuit discovery feasible, we introduce a task called Completion under Discourse Relation (CuDR), where a model completes a discourse given a specified relation. To support this task, we construct a corpus of minimal contrastive pairs tailored for activation patching in circuit discovery. Experiments show that sparse circuits ($\approx 0.2\%$ of a full GPT-2 model) recover discourse understanding in the English PDTB-based CuDR task. These circuits generalize well to unseen discourse frameworks such as RST and SDRT. Further analysis shows lower layers capture linguistic features such as lexical semantics and coreference, while upper layers encode discourse-level abstractions. Feature utility is consistent across frameworks (e.g., coreference supports Expansion-like relations).

academic

Discursive Circuits: How Do Language Models Understand Discourse Relations?

Basic Information

Paper ID: 2510.11210
Title: Discursive Circuits: How Do Language Models Understand Discourse Relations?
Authors: Yisong Miao, Min-Yen Kan (National University of Singapore)
Classification: cs.CL (Computational Linguistics), cs.LG (Machine Learning)
Publication Date: October 13, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.11210

Abstract

This paper investigates which components in transformer language models are responsible for discourse understanding. The authors hypothesize that sparse computational graphs (termed discursive circuits) control how models process discourse relations. Unlike simple tasks, discourse relations involve longer text spans and complex reasoning. To make circuit discovery feasible, the authors introduce the "Completion Under Discourse Relations" (CUDR) task, which requires models to complete discourse under specified relations. Experiments demonstrate that sparse circuits (approximately 0.2% of GPT-2 model connections) can recover discourse understanding capabilities in PDTB-based CUDR tasks and generalize well to unseen discourse frameworks such as RST and SDRT.

Research Background and Motivation

Problem Definition

Discourse structure is crucial for ensuring language model safety and ethical behavior, yet little is known about how language models internally process discourse, limiting our ability to guarantee model reliability and harmless outputs.

Research Significance

Safety Requirements: Discourse understanding is critical for model safety and ethical behavior
Missing Interpretability: Existing methods lack deep understanding of discourse processing mechanisms
Complexity Challenges: Discourse relations involve longer contexts and complex reasoning compared to simple tasks

Limitations of Existing Approaches

Attention visualization and rationale generation methods lack mechanistic explanations
Existing circuit discovery methods primarily focus on simple tasks (e.g., numerical comparison) and cannot be directly adapted to discourse relations
Lack of unified cross-framework understanding: Mechanistic comparisons across different discourse frameworks are absent

Research Motivation

Bridge the linguistic structure of discourse and the requirements of circuit discovery to open new pathways for understanding mechanisms in complex language tasks.

Core Contributions

Proposes CUDR Task: Designs a discourse relation completion task suitable for circuit discovery
Constructs Multi-Framework Dataset: Covers major discourse frameworks including PDTB, RST, SDRT, with 27,754 instances total
Discovers Discursive Circuits: Identifies sparse circuits comprising only 0.2% of model connections but achieving 90% faithfulness
Cross-Framework Generalization: Demonstrates that circuits learned from PDTB generalize well to other discourse frameworks
Constructs Circuit Hierarchy: First to construct discourse hierarchy based on neural circuit components
Linguistic Feature Analysis: Reveals linguistic features captured at different levels and their cross-framework consistency

Methodology Details

Task Definition: CUDR (Completion Under Discourse Relations)

The CUDR task creates a controlled environment to test model discourse behavior:

Input Format:

Original discourse: $d_{ori} = (Arg1, Arg2, R, Conn)$
Counterfactual discourse: $d_{cf} = (Arg1, Arg'_2, R', Conn')$

Task Setup:

Please select one of the following two options to complete the discourse:
Option 1: "he goes to the canteen" 
Option 2: "the canteen is closed"

To complete: [Bob is hungry]_{Arg1} [so]_{Conn} → [he goes to the canteen]_{Arg2}

By changing the discourse connective (from "so" to "but"), the model's prediction should change accordingly.

Circuit Discovery Method

Activation Patching

Define the impact of edge $e$ as: $g(e) = L(x_{cf}|do(E = e_{ori})) - L(x_{cf})$

where $L$ is the evaluation metric, $x_{cf}$ is the counterfactual input, and $e_{ori}$ is the activation in the original run.

Edge Attribution Patching

Accelerate computation using first-order Taylor approximation: $g(e) \approx (z^{ori}_u - z^{cf}_u)^T \nabla_v L(x_{cf})$

where $z^{ori}_u$ and $z^{cf}_u$ are activations of node $u$ in original and counterfactual runs respectively, and $\nabla_v L(x_{cf})$ is the gradient at node $v$ .

Discursive Circuit Construction

Apply attribution patching to sample sets for given discourse relations
Compute average $g(e)$ value for each edge
Select top 1000 edges with highest absolute values to form the circuit

Dataset Construction

Multi-Framework Coverage

Discourse Framework	Number of Relations	CUDR Data Instances
PDTB	13	11,843
GDTB	12	5,253
GUM-RST	17	6,805
SDRT	10	3,853
Total	52	27,754

Counterfactual Generation Strategy

Generate counterfactual $Arg'_2$ using GPT-4o-mini, ensuring:

Consistency with original $Arg1$ and counterfactual connective $Conn'$
Length matching with original $Arg2$
Clear and salient relation expression

Experimental Setup

Model Selection

Primary Model: GPT-2 medium (following standard practice in circuit discovery research)
Extended Validation: GPT-2 large

Evaluation Metrics

Faithfulness Score: $\frac{\Delta L_{patch}}{\Delta L_{full}}$ (normalized faithfulness)
Logical Difference: $\Delta L = L(Arg2) - L(Arg'_2)$

Baseline Methods

Random Circuits: Randomly sampled transformer edges
IOI Circuits: Indirect Object Identification circuits (representing general language modeling capabilities)

Circuit Hierarchy

Construct PDTB-style circuit hierarchy:

L3: Leaf node relations (1000 edges)
L2: Merged multiple L3 circuits (500+ edges)
L1: Top-level category circuits (200-500 edges)
L0: Meta-circuits (137 edges)

Experimental Results

Main Results

RQ1: Faithfulness of Discursive Circuits

Strong Faithfulness: L3 and L1 circuits achieve 90% faithfulness with only ~200 edges
Superior to Baselines: Significantly outperform random and IOI baselines
Hierarchy Effects: Fine-grained circuits (L3) are more effective in early stages but with higher variance

RQ2: Cross-Framework Generalization

Good Generalization: PDTB circuits effectively generalize to GDTB, RST, SDRT
Performance Ranking: Own > L3 > L1 ≈ L0 > IOI > Random (consistent trend)
Circuit Overlap: Framework overlap correlates positively with performance (e.g., PDTB→GDTB: r=0.44)

RQ3: Linguistic Feature Analysis

Identifies usage patterns of five key linguistic features:

Modality: Most widely used
Synonymy: More common than antonymy
Negation: Consistently used across frameworks
Antonymy: Weaker in causal and temporal relations
Coreference: Most active in expansion-class relations

Hierarchy Analysis

Lower Layers: Capture linguistic features (lexical semantics, coreference)
Higher Layers: Encode discourse-level abstractions
Discourse-Specific Regions: Source layers 8-16, target layers 10-20 contain discourse-specific edges

Case Analysis

Error case analysis reveals limitations of PDTB circuits in handling interjections ("yay!!") and subject ellipsis, while SDRT circuits handle these phenomena better.

Discourse Modeling

Framework Development: Three mainstream frameworks—PDTB, RST, SDRT
Unification Efforts: DISRPT benchmark, automatic framework conversion
Evaluation Methods: Question-answering evaluation, synthetic data generation

Mechanistic Interpretability

Circuit Discovery: Primarily applied to simple tasks (IOI, numerical comparison, subject-verb agreement)
Method Limitations: Existing methods struggle with complex discourse phenomena
This Paper's Contribution: First application of circuit discovery to discourse understanding

Conclusions and Discussion

Main Conclusions

Sparse Effectiveness: Only 0.2% of model connections enable discourse understanding
Cross-Framework Consistency: Language models may encode shared discourse relation representations
Hierarchical Processing: Lower layers process linguistic features, higher layers process discourse abstractions
Feature Consistency: Linguistic feature utility remains consistent across frameworks

Limitations

Language Limitation: Only English corpora studied
Model Scope: Primarily focuses on single transformer model
Human Brain Comparison: No comparison with human discourse processing mechanisms
Data Quality: Generated counterfactual data is relatively simple and direct

Future Directions

Multilingual Extension: Explore cross-linguistic consistency of discourse circuits
Complex Scenarios: Extend to more complex discourse styles and ambiguous cases
Application-Oriented: Use for bias detection and model steering
Architecture Extension: Adapt to larger-scale language models

In-Depth Evaluation

Strengths

High Innovation: First application of circuit discovery to complex discourse understanding tasks
Rigorous Methodology: CUDR task design is clever and effectively supports activation patching
Comprehensive Coverage: Encompasses multiple mainstream discourse frameworks with substantial dataset scale
Deep Analysis: Multi-dimensional analysis from circuit hierarchy to linguistic features
Good Generalization: Cross-framework generalization results are convincing

Weaknesses

Computational Complexity: Circuit discovery process is computationally intensive, difficult to scale to larger models
Data Dependency: Relies on LLM-generated counterfactual data, potentially introducing bias
Evaluation Limitations: Primarily based on single model architecture, generalization needs verification
Theoretical Depth: Lacks theoretical explanation for why these circuits are effective

Impact

Academic Value: Opens new directions for mechanistic research in discourse understanding
Practical Potential: Applicable to model debugging, bias detection, and other applications
Methodological Contribution: CUDR paradigm can be generalized to other complex NLP tasks
Interdisciplinary Significance: Connects computational linguistics and mechanistic interpretability research

Applicable Scenarios

Model Analysis: Understanding discourse processing mechanisms in large language models
Safety Detection: Identifying potential biases in model discourse understanding
Model Improvement: Guiding targeted enhancement of discourse understanding capabilities
Educational Research: Providing computational perspective validation for discourse theory

References

The paper cites rich related work including:

Classical discourse theory literature: Mann & Thompson (1987), Asher & Lascarides (2003)
Circuit discovery methods: Wang et al. (2023), Conmy et al. (2023)
Discourse datasets: Webber et al. (2019), Liu et al. (2024b)
Mechanistic interpretability: Zhang & Nanda (2024), Miller et al. (2024)

Overall Assessment: This is a high-quality research paper that excels in methodological innovation, experimental design, and analytical depth. Through clever CUDR task design, it successfully applies circuit discovery techniques to complex discourse understanding tasks, providing new perspectives for understanding language models' internal mechanisms. Despite certain limitations, its pioneering work and rich findings demonstrate significant academic value and practical potential.