2025-11-14T03:31:11.744871

Semantic, Orthographic, and Phonological Biases in Humans' Wordle Gameplay

Liang, Kabbara, Liu et al.
We show that human players' gameplay in the game of Wordle is influenced by the semantics, orthography, and phonology of the player's previous guesses. We compare actual human players' guesses with near-optimal guesses using NLP techniques. We study human language use in the constrained environment of Wordle, which is situated between natural language use and the artificial word association task
academic

Semantic, Orthographic, and Phonological Biases in Humans' Wordle Gameplay

Basic Information

  • Paper ID: 2411.18634
  • Title: Semantic, Orthographic, and Phonological Biases in Humans' Wordle Gameplay
  • Authors: Jiadong Liang, Adam Kabbara, Jiaying Liu, Ronaldo Luo, Kina Kim, Michael Guerzhoy (University of Toronto)
  • Classification: cs.CL (Computational Linguistics)
  • Publication Date: November 13, 2025 (arXiv v2)
  • Paper Link: https://arxiv.org/abs/2411.18634

Abstract

This study analyzes human player behavior in Wordle to reveal systematic influences of semantic, orthographic, and phonological features of previous guesses on the word-guessing process. The research contrasts real human player guesses with near-optimal strategies based on maximum entropy heuristics, demonstrating patterns of cognitive bias in human language use within a constrained environment situated between natural language use and artificial vocabulary association tasks.

Research Background and Motivation

1. Research Questions

This study investigates whether human guessing behavior in Wordle systematically deviates from optimal strategies, and whether these deviations are influenced by cognitive biases, particularly priming effects.

2. Significance of the Problem

  • Cognitive Science Value: Wordle provides a unique research environment situated between completely free natural language use and highly controlled vocabulary association tasks, offering a new ecologically valid setting for studying human language cognition
  • Theoretical Significance: Validates the applicability of psychological priming theory in real game scenarios
  • Methodological Contribution: Demonstrates how to utilize NLP techniques to quantify human cognitive biases

3. Limitations of Existing Research

  • Traditional vocabulary association studies are predominantly conducted in artificial laboratory tasks, lacking ecological validity
  • Natural language use scenarios are too complex to control variables effectively
  • Systematic research on cognitive biases in constrained vocabulary generation tasks is lacking

4. Research Motivation

The researchers hypothesized that:

  • Priming effects influence word choice in Wordle gameplay
  • Humans tend to select words similar to previous guesses to reduce cognitive load
  • These biases can be quantified through comparison with near-optimal strategies

Core Contributions

  1. First Systematic Demonstration: Human cognitive biases in Wordle exist across three dimensions—semantic, orthographic, and phonological
  2. Quantification Methodology: Proposes a comprehensive methodology using multiple NLP techniques (GloVe embeddings, edit distance, phonetic transcription) to quantify human deviation from optimal strategies
  3. Large-Scale Data Analysis: Empirical study based on 83,000 real game records collected from Reddit
  4. Context-Dependent Findings: Reveals the relationship between cognitive bias intensity and game state constraints—greater freedom leads to more pronounced biases
  5. Interdisciplinary Contribution: Provides a cross-disciplinary research paradigm for cognitive psychology, computational linguistics, and game studies

Methodology Details

Task Definition

Input: Sequential guess sequences in Wordle gameplay
Output: Quantification of differences between human guesses and near-optimal strategies across multiple dimensions
Constraints:

  • Each guess must be a valid 5-letter English word
  • Players adjust subsequent guesses based on feedback (green/yellow/gray)
  • Goal is to guess the target word within 6 attempts

Near-Optimal Strategy Baseline

The study uses the entropy-based heuristic solver from Doddle as the near-optimal strategy:

  • Optimal Solution (Bertsimas & Paskov 2024): Dynamic programming approach, average 3.421 guesses
  • Depth-1 Minimax Heuristic: Worst case 5 attempts, average 3.482 guesses
  • Entropy Heuristic (adopted in this study): Guarantees completion within 6 attempts, average 3.432 guesses

The heuristic is chosen over exact optimal solutions for computational efficiency, with minimal performance difference (only 0.011 guesses).

Measurement Metrics System

1. Levenshtein Distance (Orthographic Similarity)

  • Definition: Minimum number of edit operations (insertion, deletion, substitution) required to transform one word into another
  • Cognitive Significance: Smaller distances indicate players tend to select structurally similar words, potentially reflecting a tendency to reduce cognitive effort
  • Calculation: Compares edit distance between consecutive guesses

2. Semantic Distance (GloVe)

  • Definition: Negative cosine similarity using GloVe word embeddings
  • Formula: dsemantic=1cos(va,vb)d_{semantic} = 1 - \cos(v_a, v_b), where va,vbv_a, v_b are word vectors
  • Cognitive Significance: Tests whether humans tend to guess semantically related words (e.g., "TOAST" after "BREAD")

3. Hamming Distance (Position-Specific Differences)

  • Definition: Number of positions where corresponding characters differ in two equal-length strings
  • Cognitive Significance: More stringent than Levenshtein, focusing only on fixed-position differences, better aligned with Wordle's feedback mechanism

4. Rhyme Matching (Phonological Similarity)

  • Implementation: Uses CMU Pronouncing Dictionary for phonetic transcription
  • Judgment Criteria: Perfect rhyme—phonetic endings match and contain stressed vowels
  • Cognitive Significance: Tests whether phonological similarity influences word choice

Game State Encoding

Uses notation (cg, cy, cb) to represent game state:

  • cg: Number of green squares (correct letter in correct position)
  • cy: Number of yellow squares (correct letter in wrong position)
  • cb: Number of gray squares (incorrect letters)

Example: (2, 0, 3) represents 2 green, 0 yellow, 3 gray squares.

Statistical Analysis Methods

  1. Effect Size: Uses Cohen's d to measure differences between human and near-optimal strategy distributions d=μhumanμoptimalσpooledd = \frac{\mu_{human} - \mu_{optimal}}{\sigma_{pooled}}
  2. Significance Testing: Calculates p-values based on t-statistics
  3. Stratified Analysis: Analyzes by game state separately to reveal how constraint levels affect bias

Experimental Setup

Dataset

Source: Reddit's r/Wordle subreddit
Scale: 83,000 game records
Collection Method: Uses regular expressions to extract game data shared by users in standard format
Data Provider: Watchful1 (2023) Reddit data dump
Time Range: June 2005 to December 2023

Dataset Characteristics:

  • Real player behavior in natural game environments
  • Voluntarily shared, potentially subject to selection bias
  • Limited to English Wordle games

Evaluation Metrics

  1. Cohen's d: Quantifies effect size
    • |d| < 0.2: Small effect
    • 0.2 ≤ |d| < 0.5: Medium effect
    • |d| ≥ 0.5: Large effect
  2. p-value: Statistical significance (threshold p < 0.001)
  3. Distribution Visualization: Histograms, violin plots, box plots

Comparison Method

Sole Baseline: Doddle's entropy-based heuristic solver

  • Represents near-optimal strategy
  • Performance close to theoretical optimum (differs by only 0.011 guesses)
  • Computationally feasible, can generate optimal guesses for all 83,000 data points

Implementation Details

  • GloVe Model: Pre-trained word vectors (Pennington et al. 2014)
  • Pronunciation Library: CMU Pronouncing Dictionary
  • Edit Distance: Standard Levenshtein algorithm
  • Correlation Analysis: Pearson correlation coefficient
  • Visualization: Python's matplotlib and seaborn

Experimental Results

Main Findings

1. Phonological Bias (Global Statistics)

  • Optimal Strategy: 7.3% of guesses rhyme with previous guess
  • Human Players: 9.3% of guesses rhyme with previous guess
  • Significance: p < 0.001
  • Interpretation: Humans significantly tend to select phonologically similar words

2. Orthographic Bias (State-Dependent)

Case 1: (0, 0, 5) - Completely Unconstrained State

  • Cohen's d = -0.0854 (Levenshtein)
  • Both humans and optimal strategy tend to select words with distance 5 (completely different)
  • However, humans suboptimally reuse known incorrect letters (see Figure 1a)

Case 2: (2, 0, 3) - Partially Constrained State

  • Cohen's d = -1.13 (Levenshtein, large effect)
  • p < 10^-12
  • Humans significantly underestimate exploration: tend to select words similar to previous guess (see Figure 1b)
  • This is one of the strongest bias signals

3. Semantic Bias (State-Dependent)

Case 1: (0, 0, 5) - Unconstrained

  • Cohen's d = -0.437 (GloVe distance)
  • p = 1.07×10^-189
  • Humans tend to select semantically closer words (see Figure 1c)

Case 2: (3, 2, 0) - Highly Constrained

  • Cohen's d = 0.00451
  • p = 0.318 (not significant)
  • Semantic bias disappears when constraints are strong (see Figure 1d)

4. Hamming Distance Bias

Case 1: (0, 0, 5)

  • Cohen's d = 0.157
  • Humans suboptimally reuse known incorrect characters (see Figure 1e)

Case 2: (2, 2, 1)

  • Cohen's d = 0.289
  • Humans suboptimally use new characters rather than optimizing known information (see Figure 1f)

Systematic Patterns

Relationship Between Constraints and Bias (Figures 3 and 4)

Green Squares and Bias:

  • More green squares (stronger constraints) correlate with smaller semantic bias
  • 0 green squares: Cohen's d approximately -0.4 to -0.6
  • 4 green squares: Cohen's d approaches 0

Gray Squares and Bias:

  • More gray squares (more exclusion information) weakens bias
  • Indicates that increased constraints bring humans closer to optimal strategy

Key Finding:

"Humans exhibit stronger cognitive biases when degrees of freedom are large, while approaching optimal strategy under high constraint"

Cross-Metric Correlation Analysis

Levenshtein vs. Hamming:

  • All word pairs: Pearson r = 0.95 (strong correlation)
  • Character differences < 5: Pearson r = 0.81
  • Interpretation: Both measure orthographic similarity, highly correlated

Levenshtein vs. GloVe Semantic Distance:

  • Pearson r = 0.06 (weak correlation)
  • Interpretation: Orthographic similarity and semantic similarity are essentially independent
  • Significance: Semantic and orthographic biases operate as independent cognitive mechanisms (see Figure 2)

Case Analysis

While the paper does not provide specific word pair examples, the results suggest:

Semantic Bias Examples:

  • Guess sequences might include: "BREAD" → "TOAST" → "ROAST"
  • Semantic field remains in food/cooking domain

Orthographic Bias Examples:

  • In (2,0,3) state: "CRANE" → "CRATE" → "CRAZE"
  • Preserves prefix, gradually adjusts

Phonological Bias Examples:

  • Rhyming sequences: "LIGHT" → "FIGHT" → "SIGHT"

1. Priming Effects in Cognitive Psychology

Schacter & Buckner (1998):

  • Defines priming as the phenomenon where past experiences unconsciously influence behavior
  • This study applies this theory to game scenarios

Nelson et al. (1987):

  • Studies rhyme effects on memory and vocabulary association
  • Finding: Rhyming effects appear only when participants actively attend to rhyme
  • Resonates with this study's 9.3% vs 7.3% rhyme bias

Deese (1962), De Deyne & Storms (2008):

  • Studies grammatical category effects on vocabulary association
  • Provides theoretical foundation for this study's semantic bias

2. Lexical Networks and Semantic Structure

Steyvers & Tenenbaum (2005):

  • Analyzes sparsity of vocabulary association networks (each word connects to only 0.44% of other words)
  • Vocabulary networks exhibit small-world properties and power-law distributions
  • Supports this study's hypothesis about semantic bias

3. Wordle Optimal Strategy Research

Bertsimas & Paskov (2024):

  • Uses dynamic programming to find exact optimal solution
  • Best starting word: "SALET"
  • Minimum average guesses: 3.421

Cross (2022) - Doddle:

  • Depth-1 minimax heuristic: average 3.482 guesses
  • Entropy heuristic: average 3.432 guesses
  • Baseline method adopted in this study

4. Lexical Puzzle Solving

Underwood et al. (1994):

  • Studies vocabulary retrieval ability of crossword puzzle experts
  • Finds experts stronger in word puzzles and morpheme manipulation
  • Indicates that vocabulary retrieval and phonological awareness are crucial for constrained vocabulary generation tasks
  • Provides evidence for similar mechanisms in Wordle

5. Computational Models of Vocabulary Association

Matusevych & Stevenson (2018):

  • Studies human vocabulary association based on lexical properties
  • This study extends to game scenarios

Luo et al. (2025):

  • Predicts entertainment responses in Wordle gameplay
  • Uses similar features but focuses on emotion rather than cognitive bias

Unique Contributions of This Study

Distinctions from related work:

  1. Ecological Validity: Real game data vs. laboratory tasks
  2. Multi-Dimensional: Simultaneously examines semantic, orthographic, and phonological dimensions
  3. Context-Dependent: Reveals how constraint levels moderate bias effects
  4. Computational Methods: Uses NLP techniques to quantify cognitive biases

Conclusions and Discussion

Main Conclusions

  1. Systematic Bias Exists: Human guesses in Wordle systematically deviate from optimal strategy, manifesting in:
    • Semantic dimension: Tendency to select semantically related words to previous guesses
    • Orthographic dimension: Tendency to select words with smaller edit distance
    • Phonological dimension: More frequent selection of rhyming words (9.3% vs 7.3%)
  2. Biases Are Non-Random: These biases are not random errors but reflect regularities in cognitive processing
  3. Moderating Role of Constraints:
    • High freedom (e.g., 0g0y5b) shows most pronounced bias
    • High constraint (e.g., 3g2y0b) brings humans close to optimal strategy
    • Indicates cognitive biases are more apparent in creative tasks
  4. Independent Mechanisms: Weak correlation between semantic and orthographic bias (r=0.06) indicates independent cognitive processes
  5. Research Paradigm Value: Wordle provides an ideal research environment between natural language use and artificial experimental tasks

Limitations

The paper explicitly discusses the following limitations in Section 8:

  1. Data Source Bias:
    • Relies on voluntarily shared Reddit data
    • May suffer from selection effects (better-performing players more likely to share)
    • Reddit user population may not represent general population
  2. Demographic Factors:
    • Lacks information on player age, education, language background
    • Cannot control for these confounding variables
  3. Language Limitations:
    • Only studies English Wordle
    • Results may not generalize to other languages
  4. Computational Approximation:
    • Uses heuristic rather than exact optimal solution (though difference is minimal)
  5. Causal Inference:
    • Observational study cannot fully establish causality
    • Cannot rule out alternative explanations (e.g., players intentionally choosing interesting words)

Future Directions

While not explicitly listed, inferrable research directions include:

  1. Cross-Linguistic Studies: Validate findings in Wordle for other languages
  2. Experimental Validation: Design controlled experiments to directly manipulate priming stimuli
  3. Individual Differences: Study differences across players of varying skill levels and cognitive styles
  4. Temporal Dynamics: Analyze how biases evolve across game progression
  5. Application Extension: Apply methodology to other constrained creative tasks

In-Depth Evaluation

Strengths

1. Methodological Innovation

  • Interdisciplinary Integration: Skillfully combines cognitive psychology theory with NLP techniques
  • High Ecological Validity: Uses real game data rather than laboratory tasks
  • Multi-Dimensional Measurement: Simultaneously examines three independent dimensions—semantic, orthographic, and phonological
  • Context Sensitivity: Discovers moderating role of constraint levels, enhancing explanatory power

2. Empirical Rigor

  • Large Sample: 83,000 data points provide sufficient statistical power
  • Effect Size Reporting: Reports not only p-values but also Cohen's d
  • Systematic Analysis: Stratified analysis by game state (Figures 3, 4)
  • Correlation Verification: Validates metric independence (r=0.06)

3. Theoretical Contribution

  • New Evidence for Priming: Validates classical theory in natural game scenarios
  • Constraints and Creativity: Reveals phenomenon that constraints reduce cognitive bias
  • Independent Mechanisms: Demonstrates semantic and orthographic biases operate independently

4. Writing Clarity

  • Clear structure, logical flow from background to methods to results
  • Effective visualization (Figure 1 comparisons are intuitive)
  • Clear symbol system (cg, cy, cb)

Weaknesses

1. Causal Inference Limitations

  • Observational study cannot establish causality
  • Cannot rule out alternative explanations:
    • Players may intentionally choose interesting/rhyming words to increase game enjoyment
    • Vocabulary availability (certain words more easily recalled) may confound priming effects

2. Data Representativeness Issues

  • Reddit users likely younger, more tech-savvy
  • Voluntary sharing may selectively exclude failed games
  • Lack of demographic information prevents generalizability assessment

3. Insufficient Mechanism Explanation

  • Lacks deep exploration of why constraints reduce bias
    • Is it cognitive resource allocation change?
    • Or natural result of reduced available vocabulary space?
  • Does not discuss individual differences (all players treated as homogeneous)

4. Missing Methodological Details

  • Does not report how missing data or outliers were handled
  • Does not address multiple comparison problem (conducted numerous hypothesis tests)
  • GloVe model specifics (dimensions, training corpus) not specified

5. Experimental Design Limitations

  • Only examines consecutive two guesses, does not consider longer history effects
  • Does not control for starting word effects (different starting words may trigger different biases)
  • Does not analyze game difficulty (certain target words may be inherently harder)

6. Statistical Issues

  • Large samples make almost any difference statistically significant (p<0.001)
  • Effect sizes more important, but some are small (e.g., -0.0854)
  • No multiple comparison correction (Bonferroni or FDR)

Impact

1. Academic Contribution

  • Cognitive Science: Provides new ecologically valid evidence for priming effects
  • Computational Linguistics: Demonstrates NLP technique applications in cognitive research
  • Game Studies: Establishes paradigm of games as cognitive laboratories

2. Methodological Value

  • Provides reproducible analysis workflow
  • Open-source tools (Doddle) facilitate follow-up research
  • Data publicly available (Reddit data)

3. Practical Value

  • Game Design: Understanding player behavior optimizes game difficulty
  • Educational Applications: Wordle useful for vocabulary teaching; understanding cognitive biases aids intervention design
  • AI Assistance: Can develop intelligent hint systems considering human biases

4. Impact of Limitations

  • Data bias may limit generalizability
  • Weak causal inference reduces application value
  • Requires experimental research for validation

Applicable Scenarios

1. Direct Applications

  • Analyze other vocabulary games (Spelling Bee, Scrabble)
  • Study cognitive biases in constrained creative tasks
  • Design game AI considering human biases

2. Extended Applications

  • Educational Technology: Vocabulary learning software design
  • Human-Computer Interaction: Understand user behavior in constrained input scenarios
  • Cognitive Assessment: Wordle as cognitive function testing tool

3. Inapplicable Scenarios

  • Completely free creative writing (too little constraint)
  • Non-English languages (requires re-validation)
  • Non-vocabulary tasks (e.g., number games)

Reproducibility Assessment

High:

  • Data publicly available (Reddit)
  • Uses open-source tools (Doddle)
  • Methods clearly described
  • Standard statistical methods

Potential Obstacles:

  • GloVe model version not specified
  • Data cleaning details insufficient
  • Computational resource requirements (83,000 data points)

Key References

  1. Bertsimas & Paskov (2024): Dynamic programming Wordle optimal solution
  2. Schacter & Buckner (1998): Neuroscience basis of priming effects
  3. Nelson et al. (1987): Rhyme effects on vocabulary association
  4. Steyvers & Tenenbaum (2005): Large-scale semantic network structure
  5. Pennington et al. (2014): GloVe word embedding method
  6. Underwood et al. (1994): Vocabulary retrieval in crossword experts
  7. Levelt (1989): Lexical retrieval in speech production

Overall Assessment

This is an excellent research paper with strong methodological innovation, empirical rigor, and significant interdisciplinary value. Its core value lies in:

  1. Pioneering Use of Wordle as a "quasi-natural laboratory" for cognitive research
  2. Systematic Quantification of cognitive biases across three dimensions
  3. Discovery of constraint level's moderating effect on bias—an important pattern

Main limitations are causal inference constraints and data representativeness issues, which are inherent limitations of observational research and do not diminish its value as exploratory research.

The paper provides a solid foundation for subsequent research, particularly demonstrating the value of game-based cognitive science and ecologically valid NLP research. Recommended follow-up: controlled experiments to further validate causal mechanisms and extension to more diverse populations and languages.

Recommended for: Researchers and students in cognitive science, computational linguistics, game studies, and human-computer interaction.