Closure Properties of General Grammars -- Formally Verified
Dvorak, Blanchette
We formalized general (i.e., type-0) grammars using the Lean 3 proof assistant. We defined basic notions of rewrite rules and of words derived by a grammar, and used grammars to show closure of the class of type-0 languages under four operations: union, reversal, concatenation, and the Kleene star. The literature mostly focuses on Turing machine arguments, which are possibly more difficult to formalize. For the Kleene star, we could not follow the literature and came up with our own grammar-based construction.
academic
Closure Properties of General Grammars -- Formally Verified
This paper formalizes general grammars (i.e., type-0 grammars) using the Lean 3 proof assistant. The authors define fundamental concepts of rewriting rules and word derivation from grammars, and prove that the class of type-0 languages is closed under four operations: union, reversal, concatenation, and Kleene star. The literature primarily focuses on Turing machine arguments, which may be more difficult to formalize. For the Kleene star operation, the authors could not follow the literature and proposed their own grammar-based construction.
Importance of Formal Language Theory: Formal language concepts are central to computer science and can be recognized through multiple formalisms, including Turing machines and formal grammars
Equivalence of Type-0 Grammars and Turing Machines: Both Turing machines and general grammars characterize the same class of recursively enumerable or type-0 languages
Limitations of Existing Formalization Work: While substantial work exists on formalizing Turing machines in proof assistants, formalization of general grammars remains relatively scarce
Advantages of Grammars: General grammars are easier to define than Turing machines, and certain proofs about general grammars are considerably simpler than analogous Turing machine proofs
Importance of Closure Properties: Closure properties of type-0 languages are fundamental results in formal language theory
Need for Formal Verification: Machine-checked rigorous proofs are needed to ensure the correctness of these foundational results
First Complete Formalization of General Grammars: Comprehensive definition of type-0 grammar concepts and operations in Lean 3
Formal Proofs of Four Closure Properties:
Union closure
Reversal closure
Concatenation closure
Kleene star closure
Novel Kleene Star Construction: Due to the absence of grammar-based constructions in the literature, the authors developed their own syntax-based construction method
Reusable Abstract Framework: Development of the lifted_grammar structure to reduce code duplication and provide generic proof patterns
Open-Source Lean Codebase of Approximately 12,500 Lines: Complete formalization implementation available for community use
def grammar_transforms (g : grammar T) (w1 w2 : list (symbol T g.nt)) : Prop :=
∃ r : grule T g.nt,
r ∈ g.rules ∧
∃ u v : list (symbol T g.nt),
w1 = u ++ r.input_L ++ [symbol.nonterminal r.input_N] ++ r.input_R ++ v ∧
w2 = u ++ r.output_string ++ v
The paper cites 26 important references, covering:
Classical textbooks: Aho & Ullman's parsing theory, Hopcroft et al.'s automata theory
Formalization work: Implementations of computational models in various proof assistants
Tool documentation: Relevant materials for Lean 3 and mathlib
Summary: This is a high-quality theoretical computer science paper that makes important technical contributions and provides valuable experience in formalization methodology. The authors' work establishes a solid foundation for constructing a complete formalized Chomsky hierarchy and holds significant value for both the formal language theory community and the proof assistant community.