2025-11-15T04:52:11.684179

Dyck Words, Pattern Avoidance, and Automatic Sequences

Mol, Rampersad, Shallit

We study various aspects of Dyck words appearing in binary sequences, where $0$ is treated as a left parenthesis and $1$ as a right parenthesis. We show that binary words that are $7/3$-power-free have bounded nesting level, but this no longer holds for larger repetition exponents. We give an explicit characterization of the factors of the Thue-Morse word that are Dyck, and show how to count them. We also prove tight upper and lower bounds on $f(n)$, the number of Dyck factors of Thue-Morse of length $2n$.

academic

Dyck Words, Pattern Avoidance, and Automatic Sequences

Basic Information

Paper ID: 2301.06145
Title: Dyck Words, Pattern Avoidance, and Automatic Sequences
Authors: Lucas Mol (Thompson Rivers University), Narad Rampersad (University of Winnipeg), Jeffrey Shallit (University of Waterloo)
Classification: cs.DM (Discrete Mathematics), cs.FL (Formal Languages), math.CO (Combinatorics)
Published Journal: Communications in Mathematics 33 (2025), no. 2, Paper no. 5
Paper Link: https://arxiv.org/abs/2301.06145

Abstract

This paper investigates various properties of Dyck words in binary sequences, where 0 is interpreted as a left parenthesis and 1 as a right parenthesis. The research demonstrates that 7/3-power-free binary words possess bounded nesting depth, though this property no longer holds for larger repetition exponents. The paper provides an explicit characterization of Dyck factors in the Thue-Morse word and demonstrates how to compute their quantities. Furthermore, tight upper and lower bounds are established for the number f(n) of Dyck factors of length 2n in the Thue-Morse word.

Research Background and Motivation

Problem Definition

The core problem under investigation is understanding the structure and properties of Dyck word factors in infinite binary sequences. Dyck words are fundamental concepts in formal language theory, representing balanced parenthesis strings with important applications in computer science and mathematics.

Research Significance

Theoretical Importance: The Dyck language is a canonical example of context-free languages. Studying its distribution in automatic sequences helps illuminate the deep connections between formal language theory and automata theory.
Combinatorial Value: Pattern avoidance and power avoidance are central research directions in combinatorics on words. This research combines these concepts with Dyck words.
Computational Applications: Automatic sequences have broad applications in algorithms and computational complexity theory. Understanding the properties of their Dyck factors has practical significance.

Limitations of Existing Research

Lack of systematic characterization of Dyck factors in specific automatic sequences
Insufficient quantitative analysis of the relationship between power avoidance and nesting depth
Absence of effective algorithms for counting Dyck factors in automatic sequences

Core Contributions

Relationship Between Power Avoidance and Nesting Depth: Proves that 7/3-power-free Dyck words have nesting depth at most 3, while there exist 7/3⁺-power-free Dyck words with arbitrarily large nesting depth.
Characterization of Dyck Factors in Thue-Morse Words: Provides a complete characterization of all Dyck factors in the Thue-Morse sequence: factors of the form h(x), where x is a factor of a certain ternary sequence s.
General Theory for Automatic Sequences: Establishes a decidability framework for Dyck factors in run-length and synchronized automatic sequences.
Precise Counting Results: Proves tight upper and lower bounds for the number d(n) of Dyck factors of length 2n in the Thue-Morse sequence: d(n) ≤ n and d(n) ≥ n/2.

Methodology Details

Task Definition

Given a binary word w = w1..n, if interpreting 0 as a left parenthesis and 1 as a right parenthesis yields a balanced parenthesis string, then w is called a Dyck word. Formally, w is a Dyck word if and only if:

B(w) = |w|₀ - |w|₁ = 0 (balance condition)
For all prefixes w', B(w') ≥ 0 (non-negativity condition)

The nesting depth N(w) is defined as the maximum value of B(w') over all prefixes.

Core Methods

1. Power Avoidance Analysis Method

Uses induction and constructive proofs:

Theorem 2.1: By analyzing the structure of 7/3-power-free Dyck words, proves that their nesting depth ≤ 3.
Theorem 2.9: Constructs special morphisms f and g such that f(gᵗ(2)) generates 7/3⁺-power-free Dyck words with arbitrarily large nesting depth.

2. Automata Theory Method

Utilizes the Walnut theorem prover for computational verification:

morphism f "0->00100110100110010110010011001011001101
           1->00101100110100110110011010010110011011
           2->00101101001101001011001101001011010011"
morphism g "0->022012 1->022112 2->202101"

3. Linear Representation Theory

For run-length and synchronized k-automatic sequences, constructs first-order logic formulas:

Balance function: Bal(i,n,x) ≡ ∃y,z N₀(i,n,y) ∧ N₁(i,n,z) ∧ ((y<z ∧ x=0) | (y≥z ∧ y=x+z))
Dyck determination: Dyck(i,n) ≡ balance condition ∧ non-negativity conditions

Technical Innovations

Morphism Construction Technique: Designs special 6-uniform morphism g and 38-uniform morphism f to achieve precise control of nesting depth.
Synchronized Sequence Theory: Extends run-length and synchronization concepts to Dyck language analysis, establishing a decidability framework.
Linear Representation Minimization: Uses the Schützenberger algorithm to reduce the linear representation rank for Thue-Morse Dyck factor counting from 29 to 7.

Experimental Setup

Computational Tools

Walnut Theorem Prover: For first-order logic verification of automatic sequences
Linear Algebra Systems: For matrix operations and linear representation calculations
Symbolic Computation: For verifying recurrence relations and asymptotic behavior

Verification Methods

Small-Scale Verification: Direct computation for cases where n < 29
Inductive Proof: Uses mathematical induction to prove general results
Computer-Assisted Verification: Leverages Walnut for large-scale computation (e.g., 130GB memory, 20321 seconds CPU time)

Experimental Results

Main Quantitative Results

1. Nesting Depth Bounds

Upper Bound: Nesting depth of 7/3-power-free Dyck words ≤ 3
Lower Bound: There exist 7/3⁺-power-free Dyck words with arbitrarily large nesting depth

2. Thue-Morse Dyck Factor Counting

Precise recurrence relations:

d(2n) = 2d(n)
d(4n+3) = 2d(n) + d(2n+1) + q(n)
d(8n+1) = 2d(2n+1) + d(4n+1) - q(n)
d(8n+5) = 2d(n) + d(2n+1) + 2d(2n+2)

where q(n) is a 2-automatic sequence with 1 ≤ q(n) ≤ 2.

3. Tight Bounds Theorem

Upper Bound: d(n) ≤ n, with equality when n = 3·2ⁱ
Lower Bound: d(n) ≥ n/2, with equality when n = 2ⁱ
Odd Case: When n is odd, d(n) ≥ (n+3)/2

4. Asymptotic Average

∑₀≤ᵢ<₂ₙ d(i) = 19·4ⁿ/48 - 2ⁿ/4 + 5/3, with average value (19/24)n

Specific Numerical Results

Values of d(n) for the first 21 terms:

n	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
d(n)	1	1	2	3	2	4	6	6	4	8	8	8	12	9	12	13	8	14	16	14	16

Results for Other Sequences

Fibonacci Sequence: Contains only Dyck factors 01 and 0101
Period-Doubling Sequence: Contains only Dyck factors 01, 0101, 010101
Rudin-Shapiro Sequence: Contains Dyck factors with arbitrarily large nesting depth

Formal Language Theory

This research builds upon Chomsky and Schützenberger's theory of context-free languages, particularly the algebraic theory of Dyck languages.

Combinatorics on Words

Power Avoidance Theory: Inherits from Thue's pioneering work on overlap-free words
Automatic Sequences: Based on Cobham's theory of k-automatic sequences and recent concepts of synchronized sequences

Computational Methods

Walnut System: Utilizes the automatic theorem prover developed by Mousavi and Shallit
Linear Representations: Applies Berstel and Reutenauer's theory of noncommutative rational series

Conclusions and Discussion

Main Conclusions

Critical Exponent Phenomenon: 7/3 is the critical exponent for the boundedness of nesting depth in Dyck words, reflecting the profound connection between power avoidance and structural complexity.
Universality of Automatic Sequences: Run-length and synchronization properties provide a unified framework for studying Dyck factors in automatic sequences.
Precise Counting Theory: The counting of Dyck factors in the Thue-Morse sequence reveals the rich structure of k-regular sequences.

Limitations

Computational Complexity: Large-scale Walnut computations require enormous resources (130GB memory).
Special Sequence Dependence: Some results (such as run-length and synchronization properties) depend on the special characteristics of sequences.
Generalization Degree: Some results apply only to specific classes of automatic sequences.

Future Directions

Higher-Dimensional Generalization: Study the distribution of higher-dimensional Dyck languages in automatic sequences.
Other Patterns: Extend to avoidance problems for other context-free patterns.
Algorithm Optimization: Develop more efficient algorithms for counting Dyck factors.

In-Depth Evaluation

Strengths

Theoretical Depth: Organically combines power avoidance, automatic sequences, and formal language theory, demonstrating profound theoretical expertise.
Methodological Innovation: Clever application of morphism construction and linear representation techniques, particularly precise control of nesting depth.
Computational Rigor: Extensive use of computer-assisted verification enhances the reliability of results.
Result Completeness: Provides a complete theoretical picture from existence to counting.

Weaknesses

Computational Resources: Some proofs rely on large-scale computation, which may limit result verifiability.
Generalizability: Some technical methods may be difficult to generalize to broader sequence classes.
Application Orientation: The practical application value of theoretical results requires further exploration.

Impact

Interdisciplinary Bridging: Promotes cross-disciplinary development of combinatorics, formal language theory, and automata theory.
Methodological Contribution: Provides new analytical frameworks for studying structural patterns in automatic sequences.
Computational Tools: Demonstrates the powerful potential of modern theorem proving tools in combinatorial problems.

Applicable Scenarios

Theoretical Research: Suitable for in-depth research in combinatorics on words and formal language theory.
Algorithm Design: Provides theoretical foundations for designing algorithms that process structured sequences.
Educational Applications: Serves as an excellent case study for demonstrating modern mathematical computational methods.

References

This paper cites important literature from formal language theory, combinatorics, and automata theory, including:

Chomsky & Schützenberger's theory of context-free languages
Thue's pioneering work on overlap-free words
Allouche & Shallit's theory of k-regular sequences
Berstel & Reutenauer's noncommutative rational series
Literature on modern computational tools such as Walnut

Overall Assessment: This is an excellent paper demonstrating both theoretical depth and technical innovation. It successfully combines concepts and methods from multiple mathematical branches to provide important contributions to understanding structural patterns in automatic sequences. While there are certain limitations in computational complexity and generalizability, its theoretical value and methodological significance are substantial.