2025-11-23T12:04:17.035274

Discursive Circuits: How Do Language Models Understand Discourse Relations?

Miao, Kan
Which components in transformer language models are responsible for discourse understanding? We hypothesize that sparse computational graphs, termed as discursive circuits, control how models process discourse relations. Unlike simpler tasks, discourse relations involve longer spans and complex reasoning. To make circuit discovery feasible, we introduce a task called Completion under Discourse Relation (CuDR), where a model completes a discourse given a specified relation. To support this task, we construct a corpus of minimal contrastive pairs tailored for activation patching in circuit discovery. Experiments show that sparse circuits ($\approx 0.2\%$ of a full GPT-2 model) recover discourse understanding in the English PDTB-based CuDR task. These circuits generalize well to unseen discourse frameworks such as RST and SDRT. Further analysis shows lower layers capture linguistic features such as lexical semantics and coreference, while upper layers encode discourse-level abstractions. Feature utility is consistent across frameworks (e.g., coreference supports Expansion-like relations).
academic

Discursive Circuits: How Do Language Models Understand Discourse Relations?

Basic Information

  • Paper ID: 2510.11210
  • Title: Discursive Circuits: How Do Language Models Understand Discourse Relations?
  • Authors: Yisong Miao, Min-Yen Kan (National University of Singapore)
  • Classification: cs.CL (Computational Linguistics), cs.LG (Machine Learning)
  • Publication Date: October 13, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.11210

Abstract

This paper investigates which components in transformer language models are responsible for discourse understanding. The authors hypothesize that sparse computational graphs (termed discursive circuits) control how models process discourse relations. Unlike simple tasks, discourse relations involve longer text spans and complex reasoning. To make circuit discovery feasible, the authors introduce the "Completion Under Discourse Relations" (CUDR) task, which requires models to complete discourse under specified relations. Experiments demonstrate that sparse circuits (approximately 0.2% of GPT-2 model connections) can recover discourse understanding capabilities in PDTB-based CUDR tasks and generalize well to unseen discourse frameworks such as RST and SDRT.

Research Background and Motivation

Problem Definition

Discourse structure is crucial for ensuring language model safety and ethical behavior, yet little is known about how language models internally process discourse, limiting our ability to guarantee model reliability and harmless outputs.

Research Significance

  1. Safety Requirements: Discourse understanding is critical for model safety and ethical behavior
  2. Missing Interpretability: Existing methods lack deep understanding of discourse processing mechanisms
  3. Complexity Challenges: Discourse relations involve longer contexts and complex reasoning compared to simple tasks

Limitations of Existing Approaches

  1. Attention visualization and rationale generation methods lack mechanistic explanations
  2. Existing circuit discovery methods primarily focus on simple tasks (e.g., numerical comparison) and cannot be directly adapted to discourse relations
  3. Lack of unified cross-framework understanding: Mechanistic comparisons across different discourse frameworks are absent

Research Motivation

Bridge the linguistic structure of discourse and the requirements of circuit discovery to open new pathways for understanding mechanisms in complex language tasks.

Core Contributions

  1. Proposes CUDR Task: Designs a discourse relation completion task suitable for circuit discovery
  2. Constructs Multi-Framework Dataset: Covers major discourse frameworks including PDTB, RST, SDRT, with 27,754 instances total
  3. Discovers Discursive Circuits: Identifies sparse circuits comprising only 0.2% of model connections but achieving 90% faithfulness
  4. Cross-Framework Generalization: Demonstrates that circuits learned from PDTB generalize well to other discourse frameworks
  5. Constructs Circuit Hierarchy: First to construct discourse hierarchy based on neural circuit components
  6. Linguistic Feature Analysis: Reveals linguistic features captured at different levels and their cross-framework consistency

Methodology Details

Task Definition: CUDR (Completion Under Discourse Relations)

The CUDR task creates a controlled environment to test model discourse behavior:

Input Format:

  • Original discourse: dori=(Arg1,Arg2,R,Conn)d_{ori} = (Arg1, Arg2, R, Conn)
  • Counterfactual discourse: dcf=(Arg1,Arg2,R,Conn)d_{cf} = (Arg1, Arg'_2, R', Conn')

Task Setup:

Please select one of the following two options to complete the discourse:
Option 1: "he goes to the canteen" 
Option 2: "the canteen is closed"

To complete: [Bob is hungry]_{Arg1} [so]_{Conn} → [he goes to the canteen]_{Arg2}

By changing the discourse connective (from "so" to "but"), the model's prediction should change accordingly.

Circuit Discovery Method

Activation Patching

Define the impact of edge ee as: g(e)=L(xcfdo(E=eori))L(xcf)g(e) = L(x_{cf}|do(E = e_{ori})) - L(x_{cf})

where LL is the evaluation metric, xcfx_{cf} is the counterfactual input, and eorie_{ori} is the activation in the original run.

Edge Attribution Patching

Accelerate computation using first-order Taylor approximation: g(e)(zuorizucf)TvL(xcf)g(e) \approx (z^{ori}_u - z^{cf}_u)^T \nabla_v L(x_{cf})

where zuoriz^{ori}_u and zucfz^{cf}_u are activations of node uu in original and counterfactual runs respectively, and vL(xcf)\nabla_v L(x_{cf}) is the gradient at node vv.

Discursive Circuit Construction

  1. Apply attribution patching to sample sets for given discourse relations
  2. Compute average g(e)g(e) value for each edge
  3. Select top 1000 edges with highest absolute values to form the circuit

Dataset Construction

Multi-Framework Coverage

Discourse FrameworkNumber of RelationsCUDR Data Instances
PDTB1311,843
GDTB125,253
GUM-RST176,805
SDRT103,853
Total5227,754

Counterfactual Generation Strategy

Generate counterfactual Arg2Arg'_2 using GPT-4o-mini, ensuring:

  1. Consistency with original Arg1Arg1 and counterfactual connective ConnConn'
  2. Length matching with original Arg2Arg2
  3. Clear and salient relation expression

Experimental Setup

Model Selection

  • Primary Model: GPT-2 medium (following standard practice in circuit discovery research)
  • Extended Validation: GPT-2 large

Evaluation Metrics

  • Faithfulness Score: ΔLpatchΔLfull\frac{\Delta L_{patch}}{\Delta L_{full}} (normalized faithfulness)
  • Logical Difference: ΔL=L(Arg2)L(Arg2)\Delta L = L(Arg2) - L(Arg'_2)

Baseline Methods

  1. Random Circuits: Randomly sampled transformer edges
  2. IOI Circuits: Indirect Object Identification circuits (representing general language modeling capabilities)

Circuit Hierarchy

Construct PDTB-style circuit hierarchy:

  • L3: Leaf node relations (1000 edges)
  • L2: Merged multiple L3 circuits (500+ edges)
  • L1: Top-level category circuits (200-500 edges)
  • L0: Meta-circuits (137 edges)

Experimental Results

Main Results

RQ1: Faithfulness of Discursive Circuits

  • Strong Faithfulness: L3 and L1 circuits achieve 90% faithfulness with only ~200 edges
  • Superior to Baselines: Significantly outperform random and IOI baselines
  • Hierarchy Effects: Fine-grained circuits (L3) are more effective in early stages but with higher variance

RQ2: Cross-Framework Generalization

  • Good Generalization: PDTB circuits effectively generalize to GDTB, RST, SDRT
  • Performance Ranking: Own > L3 > L1 ≈ L0 > IOI > Random (consistent trend)
  • Circuit Overlap: Framework overlap correlates positively with performance (e.g., PDTB→GDTB: r=0.44)

RQ3: Linguistic Feature Analysis

Identifies usage patterns of five key linguistic features:

  1. Modality: Most widely used
  2. Synonymy: More common than antonymy
  3. Negation: Consistently used across frameworks
  4. Antonymy: Weaker in causal and temporal relations
  5. Coreference: Most active in expansion-class relations

Hierarchy Analysis

  • Lower Layers: Capture linguistic features (lexical semantics, coreference)
  • Higher Layers: Encode discourse-level abstractions
  • Discourse-Specific Regions: Source layers 8-16, target layers 10-20 contain discourse-specific edges

Case Analysis

Error case analysis reveals limitations of PDTB circuits in handling interjections ("yay!!") and subject ellipsis, while SDRT circuits handle these phenomena better.

Discourse Modeling

  • Framework Development: Three mainstream frameworks—PDTB, RST, SDRT
  • Unification Efforts: DISRPT benchmark, automatic framework conversion
  • Evaluation Methods: Question-answering evaluation, synthetic data generation

Mechanistic Interpretability

  • Circuit Discovery: Primarily applied to simple tasks (IOI, numerical comparison, subject-verb agreement)
  • Method Limitations: Existing methods struggle with complex discourse phenomena
  • This Paper's Contribution: First application of circuit discovery to discourse understanding

Conclusions and Discussion

Main Conclusions

  1. Sparse Effectiveness: Only 0.2% of model connections enable discourse understanding
  2. Cross-Framework Consistency: Language models may encode shared discourse relation representations
  3. Hierarchical Processing: Lower layers process linguistic features, higher layers process discourse abstractions
  4. Feature Consistency: Linguistic feature utility remains consistent across frameworks

Limitations

  1. Language Limitation: Only English corpora studied
  2. Model Scope: Primarily focuses on single transformer model
  3. Human Brain Comparison: No comparison with human discourse processing mechanisms
  4. Data Quality: Generated counterfactual data is relatively simple and direct

Future Directions

  1. Multilingual Extension: Explore cross-linguistic consistency of discourse circuits
  2. Complex Scenarios: Extend to more complex discourse styles and ambiguous cases
  3. Application-Oriented: Use for bias detection and model steering
  4. Architecture Extension: Adapt to larger-scale language models

In-Depth Evaluation

Strengths

  1. High Innovation: First application of circuit discovery to complex discourse understanding tasks
  2. Rigorous Methodology: CUDR task design is clever and effectively supports activation patching
  3. Comprehensive Coverage: Encompasses multiple mainstream discourse frameworks with substantial dataset scale
  4. Deep Analysis: Multi-dimensional analysis from circuit hierarchy to linguistic features
  5. Good Generalization: Cross-framework generalization results are convincing

Weaknesses

  1. Computational Complexity: Circuit discovery process is computationally intensive, difficult to scale to larger models
  2. Data Dependency: Relies on LLM-generated counterfactual data, potentially introducing bias
  3. Evaluation Limitations: Primarily based on single model architecture, generalization needs verification
  4. Theoretical Depth: Lacks theoretical explanation for why these circuits are effective

Impact

  1. Academic Value: Opens new directions for mechanistic research in discourse understanding
  2. Practical Potential: Applicable to model debugging, bias detection, and other applications
  3. Methodological Contribution: CUDR paradigm can be generalized to other complex NLP tasks
  4. Interdisciplinary Significance: Connects computational linguistics and mechanistic interpretability research

Applicable Scenarios

  1. Model Analysis: Understanding discourse processing mechanisms in large language models
  2. Safety Detection: Identifying potential biases in model discourse understanding
  3. Model Improvement: Guiding targeted enhancement of discourse understanding capabilities
  4. Educational Research: Providing computational perspective validation for discourse theory

References

The paper cites rich related work including:

  • Classical discourse theory literature: Mann & Thompson (1987), Asher & Lascarides (2003)
  • Circuit discovery methods: Wang et al. (2023), Conmy et al. (2023)
  • Discourse datasets: Webber et al. (2019), Liu et al. (2024b)
  • Mechanistic interpretability: Zhang & Nanda (2024), Miller et al. (2024)

Overall Assessment: This is a high-quality research paper that excels in methodological innovation, experimental design, and analytical depth. Through clever CUDR task design, it successfully applies circuit discovery techniques to complex discourse understanding tasks, providing new perspectives for understanding language models' internal mechanisms. Despite certain limitations, its pioneering work and rich findings demonstrate significant academic value and practical potential.