The Mechanistic Emergence of Symbol Grounding in Language Models
Wu, Ma, Luo et al.
Symbol grounding (Harnad, 1990) describes how symbols such as words acquire their meanings by connecting to real-world sensorimotor experiences. Recent work has shown preliminary evidence that grounding may emerge in (vision-)language models trained at scale without using explicit grounding objectives. Yet, the specific loci of this emergence and the mechanisms that drive it remain largely unexplored. To address this problem, we introduce a controlled evaluation framework that systematically traces how symbol grounding arises within the internal computations through mechanistic and causal analysis. Our findings show that grounding concentrates in middle-layer computations and is implemented through the aggregate mechanism, where attention heads aggregate the environmental ground to support the prediction of linguistic forms. This phenomenon replicates in multimodal dialogue and across architectures (Transformers and state-space models), but not in unidirectional LSTMs. Our results provide behavioral and mechanistic evidence that symbol grounding can emerge in language models, with practical implications for predicting and potentially controlling the reliability of generation.
academic
The Mechanistic Emergence of Symbol Grounding in Language Models
Symbol grounding describes how symbols (such as words) acquire meaning by connecting to sensorimotor experiences in the real world. Recent research suggests that grounding capabilities may spontaneously emerge in large-scale trained (vision-)language models without explicit grounding objectives. However, the specific loci and driving mechanisms of this emergence remain largely unexplored. To address this gap, this paper introduces a controlled evaluation framework that systematically traces how symbol grounding emerges in internal computations through mechanistic and causal analysis. The study reveals that grounding is concentrated in intermediate layer computations and is implemented through an aggregation mechanism, wherein attention heads aggregate environmental grounding to support language token prediction. This phenomenon is replicated across multimodal dialogue and different architectures (Transformers and state space models), but does not appear in unidirectional LSTMs.
Symbol grounding is a fundamental problem in cognitive science and artificial intelligence. Understanding how language models learn to establish connections between abstract symbols and the real world is crucial for:
Existing research primarily suffers from the following limitations:
Lack of mechanistic analysis: Most studies focus on correlational analysis of final performance without exploring internal mechanisms
Neglect of training dynamics: Insufficient systematic investigation of how grounding capabilities develop during training
Vague definitions: Equating grounding with statistical correlation between visual-textual signals, deviating from Harnad's (1990) classical definition of causal linkage
This paper systematically investigates the emergence mechanisms of symbol grounding through a minimalist test platform using causal intervention and mechanistic analysis methods.
Constructed a controlled evaluation framework: Designed a test platform with separated environment tokens (⟨ENV⟩) and language tokens (⟨LAN⟩), ensuring that correspondences must be learned
Discovered mechanistic implementation of grounding: Demonstrated that symbol grounding is implemented through an aggregation mechanism in intermediate layers
Provided cross-architecture universality evidence: Observed grounding emergence in Transformers and state space models, but not in unidirectional LSTMs
Established causal verification methods: Validated the critical role of aggregation heads in symbol grounding through attention head intervention experiments
Revealed learning beyond co-occurrence statistics: Demonstrated that learned grounding relationships cannot be fully explained by surface co-occurrence statistics
Input: Sequences containing environment tokens (⟨ENV⟩) and language tokens (⟨LAN⟩)
Output: Predict corresponding language tokens given environmental context
Constraint: Environment and language tokens use different vocabulary indices; the model must learn their correspondences
Language token source: Spoken utterance transcriptions
Example:
Training: ⟨CHI⟩ takes book⟨ENV⟩ from mother ⟨CHI⟩ what's that ⟨MOT⟩ a book⟨LAN⟩ in it
Testing: ⟨CHI⟩ asked for a new book⟨ENV⟩ ⟨CHI⟩ I love this [predict: book⟨LAN⟩]
Selected 100 high-frequency nouns from the MacArthur-Bates Communicative Development Inventory, with each word appearing ≥100 times in both ⟨ENV⟩ and ⟨LAN⟩ forms in the corpus.
Revisits the philosophical roots of symbol grounding and provides mechanistic evidence transitioning from correlation to causality, challenging the view that "connectionist systems lack intrinsic symbolic structure."
Harnad, S. (1990). The symbol grounding problem. Physica D, 42(1-3), 335-346.
Bick, A., Xing, E. P., & Gu, A. (2025). Understanding the skill gap in recurrent models: The role of the gather-and-aggregate mechanism.
Wang, L., et al. (2023). Label words are anchors: An information flow perspective for understanding in-context learning.
Belrose, N., et al. (2023). Eliciting latent predictions from transformers with the tuned lens.
Through rigorous experimental design and in-depth mechanistic analysis, this paper makes important contributions to understanding the emergence mechanisms of symbol grounding in language models. Its findings possess both theoretical value and provide practical guidance for constructing more reliable AI systems.