2025-11-22T15:25:16.453421

Complexity and accessibility of random landscapes

Pahujani, Krug
These notes introduce probabilistic landscape models defined on high-dimensional discrete sequence spaces. The models are motivated primarily by fitness landscapes in evolutionary biology, but links to statistical physics and computer science are mentioned where appropriate. Elementary and advanced results on the structure of landscapes are described with a focus on features that are relevant to evolutionary searches, such as the number of local maxima and the existence of fitness-monotonic paths. The recent discovery of submodularity as a biologically meaningful property of fitness landscapes and its consequences for their accessibility is discussed in detail.
academic

Complexity and Accessibility of Random Landscapes

Basic Information

  • Paper ID: 2502.05896
  • Title: Complexity and Accessibility of Random Landscapes
  • Authors: Sakshi Pahujani, Joachim Krug (University of Cologne)
  • Classification: q-bio.PE (Population and Evolution), cond-mat.dis-nn (Disordered Systems), math.PR (Probability)
  • Publication Date: 2025 (SciPost Physics Lecture Notes Submission)
  • Paper Link: https://arxiv.org/abs/2502.05896

Abstract

This paper introduces probabilistic landscape models defined on high-dimensional discrete sequence spaces. These models are primarily inspired by adaptive landscapes in evolutionary biology, while also engaging with relevant topics in statistical physics and computer science. The article describes both foundational and advanced results on landscape structure, with emphasis on characteristics relevant to evolutionary search, such as the number of local maxima and the existence of adaptive monotonic paths. Special attention is given to recent findings on submodularity as a biologically meaningful property of adaptive landscapes and its implications for landscape accessibility.

Research Background and Motivation

Core Questions

  1. High-dimensional landscape navigation problem: Multiple fields including biological evolution, spin glass systems, and neural network optimization involve navigation on high-dimensional complex landscapes
  2. Structural characteristics of adaptive landscapes: Understanding the distribution and accessibility of local maxima (peaks) in adaptive landscapes
  3. Wright vs. Fisher debate: Resolving the classical controversy in evolutionary biology regarding whether adaptive landscapes are rugged and difficult to navigate (Wright's view) or relatively accessible (Fisher's view)

Research Significance

  • Interdisciplinary application: This research bridges evolutionary biology, statistical physics, and computer science
  • Practical relevance: Helps understand the predictability and repeatability of evolutionary processes
  • Theoretical value: Provides mathematical frameworks and analytical tools for high-dimensional random landscapes

Limitations of Existing Approaches

  • Fully random models (such as the House of Cards model) are oversimplified and cannot reflect correlations in real biological systems
  • Lack of systematic understanding of accessibility in structured landscapes
  • Insufficient recognition of the biological significance of important mathematical properties such as submodularity

Core Contributions

  1. Unified mathematical framework: Establishes a complete theoretical system for analyzing probabilistic landscapes on high-dimensional discrete sequence spaces
  2. Accessibility phase transition theory: Reveals phase transition phenomena in the existence of accessible paths in random landscapes and determines critical thresholds
  3. Connection between submodularity and accessibility: First systematically elucidates the subset-superset accessibility property of submodular adaptive landscapes
  4. Adaptive basin of attraction theory: Provides exponential lower bounds on the size of adaptive basins of attraction in submodular landscapes
  5. Interdisciplinary connections: Establishes mappings between Fisher's geometric model and the antiferromagnetic Hopfield model

Methodology Details

Task Definition

Study adaptive landscapes defined on high-dimensional discrete sequence spaces {0,1,...,a1}L\{0,1,...,a-1\}^L, analyzing their structural characteristics (such as peak numbers) and dynamical properties (such as the existence of accessible paths).

Core Models

1. House of Cards (HoC) Model

  • Definition: Fitness values are independent and identically distributed continuous random variables
  • Peak probability: Pmax=1(a1)L+1P_{\max} = \frac{1}{(a-1)L+1}
  • Expected number of peaks: E(NL)=aL(a1)L+1E(N_L) = \frac{a^L}{(a-1)L+1}
  • Complexity: Λ=limL1LlogE(NL)=lna\Lambda = \lim_{L→∞} \frac{1}{L}\log E(N_L) = \ln a

2. Accessibility Analysis

Direct path accessibility:

  • Probability: Pβ,l=βl1(l1)!P_{β,l} = \frac{β^{l-1}}{(l-1)!}
  • Expected number of paths: E(Xα,ω)=lβl1E(X_{α,ω}) = lβ^{l-1}
  • Critical threshold: βc(l)=1lnllβ_c(l) = 1 - \frac{\ln l}{l}

Indirect path accessibility:

  • Extended adaptive landscape method for handling self-intersecting paths
  • Expected quasi-accessible path count: E[X~α,ω]k,l=0a1[(eβA)k,l]pk,lLE[\tilde{X}_{α,ω}] ∼ \prod_{k,l=0}^{a-1}[(e^βA)_{k,l}]^{p_{k,l}L}
  • Binary case condition: sinh(βc)δcosh(βc)1δ=1\sinh(β_c)^δ \cosh(β_c)^{1-δ} = 1

3. Structured Landscapes

NK model: g(σ)=i=1bgi(σi,1,σi,2,...,σi,k)g(σ) = \sum_{i=1}^b g_i(σ_{i,1}, σ_{i,2}, ..., σ_{i,k})

Rough Mount Fuji model: g(σ)=cd(σ,σ)+ξσg(σ) = -cd(σ,σ^*) + ξ_σ

Composite genotype-phenotype-fitness mapping: g(σ)=Φ[z(σ)],z(σ)=i=1Lμ=0a1ai,μδσi,μg(σ) = Φ[z(σ)], \quad z(σ) = \sum_{i=1}^L \sum_{μ=0}^{a-1} a_{i,μ}δ_{σ_i,μ}

Technical Innovations

1. Submodularity Theory

  • Universal epistasis condition: g(στ)g(σ)g(στ)g(σ)g(σ ∪ τ) - g(σ) ≤ g(σ' ∪ τ) - g(σ'), where σσσ' ⊆ σ
  • Equivalent to submodularity: g(AB)+g(AB)g(A)+g(B)g(A ∪ B) + g(A ∩ B) ≤ g(A) + g(B)
  • Biological construction: Concave phenotype-fitness mapping produces submodular landscapes

2. Subset-Superset Accessibility Property

  • Theorem: Any peak can be reached from all its subsets and supersets via direct paths
  • Proof strategy: Utilizes universal negative epistasis condition and local optimality of peaks

3. Adaptive Basin of Attraction

  • Lower bound formula: Sσ2σ+2Lσ2S_σ ≥ 2^{|σ|} + 2^{L-|σ|} - 2
  • Exponential growth: Basin of attraction size grows exponentially with genotype space

Experimental Setup

Theoretical Analysis Framework

The paper primarily employs theoretical analysis methods, including:

  • Probabilistic analysis (Markov inequality, central limit theorem)
  • Combinatorial optimization theory (submodular function theory)
  • Percolation theory (accessibility phase transitions)
  • Graph-theoretic methods (Hamming graphs, fitness graphs)

Mathematical Tools

  • Hamming distance: d(σ,τ)=i=1L(1δσi,τi)d(σ,τ) = \sum_{i=1}^L (1-δ_{σ_i,τ_i})
  • Fitness graph: Directed acyclic graph constructed by directing edges toward increasing fitness
  • Complexity definition: Λ=limL1LlogE(NL)\Lambda = \lim_{L→∞} \frac{1}{L}\log E(N_L)

Experimental Results

Main Theoretical Results

1. Exact Solutions for HoC Model

  • Peak statistics: Proves that peak numbers satisfy the central limit theorem with sub-Poisson statistics
  • Variance formula: Var(NL)=aL(a1)(L1)2{(a1)L+1}2\text{Var}(N_L) = \frac{a^L(a-1)(L-1)}{2\{(a-1)L+1\}^2}
  • Resolution of Wright-Fisher debate: In the high-dimensional limit, the probability that any single genotype is a peak approaches zero (supporting Fisher), but the total number of peaks approaches infinity (supporting Wright)

2. Accessibility Phase Transition Phenomena

  • Critical behavior: Existence of well-defined phase transition threshold βc(l)=1lnllβ_c(l) = 1 - \frac{\ln l}{l}
  • Phase transition characteristics:
    • β<βc(l)β < β_c(l): limlP[Xα,ω1]=0\lim_{l→∞} P[X_{α,ω} ≥ 1] = 0
    • β>βc(l)β > β_c(l): limlP[Xα,ω1]=1\lim_{l→∞} P[X_{α,ω} ≥ 1] = 1

3. Special Properties of Submodular Landscapes

  • Universal accessibility: Any peak can be reached from all its subsets and supersets
  • Large basins of attraction: Basin of attraction size has exponential lower bounds, far exceeding linear lower bounds in general cases

Case Studies

Submodularity of Fisher's Geometric Model

For Fisher's geometric model with one-dimensional phenotype:

  • Genotype-phenotype mapping: z(σ)=i=1Laiσiz(σ) = \sum_{i=1}^L a_i σ_i (ai>0a_i > 0)
  • Phenotype-fitness mapping: Φ(z)Φ(z) is a concave function
  • Result: Produces submodular adaptive landscape with accessibility properties

Connection with Hopfield Model

By choosing Φ=z2Φ = -z^2, establishes mapping with antiferromagnetic Hopfield model: H=i,jJijηiηj+ihiηiH = \sum_{i,j} J_{ij}η_iη_j + \sum_i h_iη_i where Jij=14aiajJ_{ij} = \frac{1}{4}a_ia_j, hi=12(jaj)aih_i = -\frac{1}{2}(\sum_j a_j)a_i

Historical Development

  • Wright (1932): Introduced the concept of adaptive landscapes, emphasizing their ruggedness
  • Fisher (1958): Geometric model predicting smoothness of high-dimensional landscapes
  • Kauffman (1987): NK model, a landscape model with tunable ruggedness

Modern Research

  • Empirical studies: Recent two decades of experimental research on adaptive landscapes in real biological systems
  • Mathematical theory: Applications of percolation theory, random geometry, and combinatorial optimization to adaptive landscapes
  • Computational methods: High-throughput experimental techniques enabling large-scale adaptive landscape studies

Interdisciplinary Connections

  • Statistical physics: Equivalent to Random Energy Model in spin glass theory
  • Computer science: Related to submodular function maximization in combinatorial optimization
  • Machine learning: Potential connections with research on neural network loss landscapes

Conclusions and Discussion

Main Conclusions

  1. Resolution of Wright-Fisher debate: Both viewpoints are correct at different levels
  2. Universality of accessibility phase transitions: Universal phase transition phenomena exist in random landscapes
  3. Important role of submodularity: Submodularity provides strong accessibility guarantees for adaptive landscapes
  4. Large basin of attraction phenomenon: Submodular landscapes possess adaptive basins of attraction of exponential size

Limitations

  1. Model simplification: Binary sequence assumption limits application in multi-allelic systems
  2. Continuous fitness assumption: Non-degenerate fitness function assumption may not hold in practice
  3. Theory-practice gap: Correspondence between theoretical predictions and actual biological systems requires further verification

Future Directions

  1. Machine learning applications: Apply submodularity concepts to deep learning loss landscape analysis
  2. Multi-dimensional phenotypes: Extend to more general multi-dimensional Fisher geometric models
  3. Empirical validation: Verify theoretical predictions through high-throughput experiments
  4. Dynamic environments: Study adaptive landscape evolution in changing environments

In-Depth Evaluation

Strengths

  1. Theoretical depth: Provides a rigorous mathematical framework for adaptive landscape research
  2. Interdisciplinary perspective: Successfully connects concepts from biology, physics, and mathematics
  3. Practical value: Provides important insights for understanding actual evolutionary processes
  4. Mathematical rigor: All major results have rigorous mathematical proofs

Weaknesses

  1. Limited empirical support: Primarily theoretical work lacking extensive empirical data support
  2. Model limitations: Certain assumptions may not hold in actual biological systems
  3. Computational complexity: Computational verification of certain theoretical results remains difficult for large-scale systems

Impact

  1. Theoretical contribution: Provides important mathematical tools for adaptive landscape theory
  2. Methodological innovation: Technical innovations such as extended adaptive landscape methods have broad application prospects
  3. Interdisciplinary influence: Likely to impact statistical physics, computer science, and other fields

Applicable Scenarios

  1. Evolutionary biology: Understanding path dependence in natural selection processes
  2. Protein engineering: Guiding directed evolution experiment design
  3. Optimization algorithms: Inspiring new global optimization algorithm design
  4. Machine learning: Understanding landscape structure in neural network training

References

This paper cites 68 important references spanning from classical foundational work by Wright and Fisher to recent empirical research, reflecting the complete development history of the field. Key references include:

  • Wright, S. (1932): Original concept of adaptive landscapes
  • Fisher, R.A. (1958): Geometric model proposal
  • Kauffman & Levin (1987): House of Cards model
  • Crona et al. (2023): Geometric classification of universal epistasis
  • Krug & Oros (2024): Systematic study of submodularity and accessibility

This paper provides important theoretical foundations for adaptive landscape research, particularly through the introduction of submodularity concepts, offering new perspectives for understanding evolution in complex adaptive systems. Its interdisciplinary approach and rigorous mathematical analysis make it an important contribution to the field.