2025-11-25T12:22:17.840833

From Numbers to Ur-Strings

Visser
In this paper we examine two ways of coding sequences in arithmetical theories. We investigate under what conditions they work. To be more precise, we study the creation of objects of a data-type that we call ur-strings, roughly sequences where the components are ordered but where we do not have an explicitly given projection function. First, we have a brief look at the beta-function which was already carefully studied by Emil Jeřábek. We study in detail our two target constructions. These constructions both employ theories of strings. The first is based on Smullyan coding and the second on the representation of binary strings in the special linear monoid of the non-negative part of discretely ordered commutative rings as introduced by Markov. We use the Markov coding to obtain an alternative proof that ${\sf PA}^{-}$ is sequential.
academic

From Numbers to Ur-Strings

Basic Information

  • Paper ID: 2411.15873
  • Title: From Numbers to Ur-Strings
  • Author: Albert Visser
  • Classification: math.LO (Mathematical Logic)
  • Publication Date: October 17, 2025
  • Paper Link: https://arxiv.org/abs/2411.15873

Abstract

This paper investigates two methods for encoding sequences in arithmetic theories and explores their conditions of operation. Specifically, it studies the construction of a data type object called "ur-strings," which resembles sequences with ordered components but without explicit projection functions. The paper first briefly reviews the β function, which was studied in detail by Emil Jeřábek, then thoroughly examines two target constructions: the first based on Smullyan encoding, and the second based on the representation of binary strings in the special linear monoid of the non-negative part of a discrete ordered commutative ring introduced by Markov. Using Markov encoding, an alternative proof is obtained that PA^- is sequentialized.

Research Background and Motivation

Core Problems

The core problems addressed in this paper concern the construction of sequence encodings in weak arithmetic theories. Specifically:

  1. Necessity of Sequence Encoding: Sequence encoding is the first step in arithmetization; once sequence encoding is obtained, phenomena of undecidability and incompleteness follow.
  2. Importance of Total Sequences: Although only partial domain sequences are needed for arithmetization, total sequences allow the construction of partial satisfaction predicates within a given theory and enable model extension to obtain complete satisfaction predicates.
  3. Challenges in Weak Theories: Constructing sequence encodings in very weak theories to more precisely understand the mathematical principles involved in sequence construction.

Research Motivation

  1. Maximizing Scope: Seeking constructions that work in the broadest possible class of theories
  2. Simplicity: Aiming for constructions and results to be as simple as possible, minimizing the use of shortened Solovay-style definable cuts
  3. Avoiding Exponential Growth: Treating the totality of the exponential function as "taboo," adhering to slow growth

Core Contributions

  1. Introduced the Concept of Ur-Strings: A weakened notion of sequences where elements are ordered but do not require length and projection functions
  2. Developed Two Encoding Strategies:
    • A method based on Smullyan encoding (working in theory PA^-_smu)
    • A method based on Markov encoding (working in theory PA^-)
  3. Established String Theory as an Intermediary: Using string theory as an intermediate stage in the construction from numbers to ur-strings
  4. Provided New Proof of PA^- Sequentialization: Obtained an alternative proof that PA^- is sequentialized using Markov encoding
  5. Deep Model-Theoretic Analysis: Analyzed the characteristics and properties of Markov strings in different models

Methodology Details

Task Definition

Investigating the construction of ur-strings in weak arithmetic theories, where:

  • Input: Weak arithmetic theories (such as PA^-, PA^-_smu)
  • Output: Direct interpretation of ur-strings such that the theory becomes sequentialized
  • Constraints: Avoiding exponential functions, working in the weakest possible theories

Core Concepts

Ur-Strings vs. Sequences

  • Sequences: Require explicit length functions and projection functions
  • Ur-Strings: Strings where all elements of specified types are embedded in their alphabet, with concatenation operations and ordering of element occurrences, but without requiring length and projection functions

Sequentialized Theories

A theory is sequentialized if and only if it directly interprets theory AS (or its two-sorted version FAC):

  • AS contains a binary predicate ∈ satisfying axioms for the existence of empty set and adjacency operations
  • FAC is the two-sorted version, containing object type o and class/concept type c

Method One: Smullyan Encoding

Basic Idea

Encoding binary strings using length-first lexicographic ordering:

0: ε    5: ba   10: abb   15: aaaa
1: a    6: bb   11: baa   16: aaab
2: b    7: aaa  12: bab   17: aaba
3: aa   8: aab  13: bba   18: aabb
4: ab   9: aba  14: bbb   19: abaa

Key Definitions

  • ℓ(n) := the largest power of 2 less than or equal to n+1
  • m⊛n := m·ℓ(n) + n
  • String concatenation: σ⊛τ = (σ∗τ)^sm

Technical Implementation

  1. Base Theory: PA^-_smu = PA^- + power existence principle + power divisibility principle
  2. String Theory Extension: TCΛ^c_1, with added length function Λ
  3. Ur-Strings Construction: Using pairing ⟨Λ(w₀)...bΛ(wₖ₋₁), bw₀...bwₖ₋₁⟩

Method Two: Markov Encoding

Basic Idea

Using the isomorphism between special linear monoid SL₂(ℕ) and the free monoid on two generators:

  • Matrix A = (1 1; 0 1) corresponds to letter a
  • Matrix B = (1 0; 1 1) corresponds to letter b
  • Key property: A^n = (1 n; 0 1), i.e., counting strings grow linearly

Technical Implementation

  1. Base Theory: PA^- (theory of non-negative discrete ordered commutative rings)
  2. Matrix Representation: Using 2×2 matrices with determinant 1
  3. Encoding Strategy: Ur-string n₀...nₖ₋₁ encoded as BA^n₀...BA^nₖ₋₁

Key Theorem

Theorem 8.21: Translation θ supports direct interpretation of TCFU1 in PA^-, where:

  • Object domain: all numbers
  • String domain: ⊘ plus all matrices of the form Bα
  • n_θ := BA^n = (1 n; 1 n+1)

Experimental Setup

Theoretical Framework

This paper primarily conducts theoretical analysis, verifying the feasibility of encoding methods in different arithmetic theories:

  1. PA^-_jer: Jeřábek's PA^-, discrete ordered commutative semiring
  2. PA^-: PA^-_jer + subtraction principle
  3. PA^-_euc: PA^- + Euclidean division
  4. PA^-_smu: PA^- + power existence principle + power divisibility principle

Model Analysis

Several key models were studied:

  • M₀ := ℤX≥0
  • M₁ := Int(ℤ)≥0 (non-negative part of integer-valued polynomials)
  • M₂ := (ℚX·X + ℤ)≥0 ("second standard model")

Experimental Results

Main Results

Smullyan Encoding Results

  1. Theorem 7.21: β supports direct interpretation of TCΛ^c_1 in PA^-_smu
  2. Interpretability: TCΛ^c_1 directly interprets TCFU1
  3. Combined Result: PA^-_smu is sequentialized

Markov Encoding Results

  1. Theorem 8.21: θ supports interpretation of TCFU1 in PA^-
  2. Stronger Result: Supports TCFU2 (including stack principle) in PA^-_euc
  3. New Proof: Provides an alternative proof that PA^- is sequentialized

Model-Theoretic Findings

Characterization of M₂ Model

Theorem 8.34: Any Markov string in M₂ can be uniquely written as a finite alternating product of forms A^P and B^Q.

Counterexample Construction

Counterexamples were constructed in M₀ and M₁ that do not satisfy the stack principle tcu7:

  • Theorem 8.39: Element A = (9, 3X+2; 3X+4, X²+2X+1) is neither of the form A^P nor βP
  • Theorem 8.42: Similarly, B is an A-string but not of the form A^P

Reverse Mathematics Results

Theorem 8.26: Principle pa17^- is equivalent to every α in the special linear monoid being of the form A^n or βn.

Historical Background

  1. Gödel's β Function: Classical sequence encoding method using the Chinese Remainder Theorem
  2. Jeřábek's Work: Modern treatment of the β function in PA^-_jer
  3. Markov's Contribution: Original observation of the isomorphism between special linear groups and free monoids
  4. Murwanashyaka's Research: Using Markov-style interpretations in weak theories

Technical Comparison

  • β Function: Broadest scope, but requires complex shortening techniques
  • Smullyan Encoding: Direct concatenation, but requires stronger base theory
  • Markov Encoding: Works in PA^-, with simple and intuitive connections

Conclusions and Discussion

Main Conclusions

  1. Complementary Methods: Both encoding methods have advantages; Smullyan encoding is more intuitive but requires stronger theory, while Markov encoding works in weaker theories
  2. Theoretical Optimality: PA^-_smu is the natural foundation for Smullyan's method, and PA^- is the natural foundation for Markov's method
  3. Modular Approach: Provides clear modular construction through string theory as an intermediary

Limitations

  1. Theory Strength: Smullyan encoding requires PA^-_smu, which is not contained in IOpen
  2. Growth Restrictions: Avoiding exponential growth limits the directness of certain constructions
  3. Model Dependence: Certain properties (such as the stack principle) depend on specific arithmetic principles

Future Directions

The paper raises several open problems:

  1. Reverse Mathematics: Can complete reverse mathematics analysis be performed on the encodings?
  2. Model Theory: Development of model theory for PA^-_smu
  3. Alternative Encodings: Precise assumptions for other encoding strategies described in Section 7.1.1
  4. Characterization Problems: Normal form characterization of Markov strings of M₀ in M₂

In-Depth Evaluation

Strengths

  1. Theoretical Depth: Provides thorough analysis of sequence encoding in weak arithmetic theories
  2. Methodological Innovation: The ur-strings concept provides a useful weakening of sequences
  3. Technical Rigor: All constructions have complete mathematical proofs
  4. Modular Design: The method using string theory as intermediary is clear and reusable

Weaknesses

  1. Limited Applications: Primarily theoretical contributions with unclear practical applications
  2. Complexity: Some constructions are quite technical and potentially difficult to understand
  3. Many Open Problems: Leaves many unresolved questions

Impact

  1. Theoretical Contribution: Provides new tools for research in weak arithmetic theories
  2. Methodological Value: Modular approach may be applicable to other constructions
  3. Foundational Research: Offers new perspectives for understanding the nature of arithmetization

Applicable Scenarios

  1. Mathematical Logic Research: Weak arithmetic theories and provability theory
  2. Model Theory: Study of non-standard models
  3. Computational Theory: Arithmetization of formal systems

References

The paper includes 76 references covering multiple fields including mathematical logic, model theory, and algebra, particularly:

  • Jeřábek's work on weak arithmetic theories
  • Markov's classical works on algorithmic theory
  • Related research on string theory and concatenation theory
  • Research on weakly essentially undecidable theories

This paper represents significant progress in the study of weak arithmetic theories. By introducing the concept of ur-strings and two concrete encoding methods, it provides new perspectives for understanding the nature of sequence encoding. Although primarily theoretical work, its rigorous mathematical treatment and deep analysis make it an important contribution to the field.