2025-11-14T08:01:10.960100

Infectious diseases, imposing density-dependent mortality on MHC/HLA variation, can account for balancing selection and MHC/HLA polymorphism

Green
The human MHC transplantation loci (HLA-A, -B, -C, -DPB1, -DQB1, -DRB1) are the most polymorphic in the human genome. It is generally accepted this polymorphism reflects a role in presenting pathogen-derived peptide to the adaptive immune system. Proposed mechanisms for the polymorphism such as negative frequency-dependent selection (NFDS) and heterozygote advantage (HA) focus on HLA alleles, not haplotypes. Here, we propose a model for the polymorphism in which infectious diseases impose independent density-dependent regulation on HLA haplotypes. More specifically, a complex pathogen environment drives extensive host polymorphism through a guild of HLA haplotypes that are specialised and show incomplete peptide recognition. Separation of haplotype guilds is maintained by limiting similarity. The outcome is a wide and stable range of haplotype densities at steady-state in which effective Fisher fitnesses are zero. Densities, and therefore frequencies, emerge theoretically as alternative measures of fitness. A catalogue of ranked frequencies is therefore one of ranked fitnesses. The model is supported by data from a range of sources including a Caucasian HLA dataset compiled by the US National Marrow Donor Program (NMDP). These provide evidence of positive selection on the top 350-2000 5-locus HLA haplotypes taken from an overall NMDP sample set of 10E5. High-fitness haplotypes drive the selection of 137 high-frequency alleles spread across the 5 HLA loci under consideration. These alleles demonstrate positive epistasis and pleiotropy in the formation of haplotypes. Allelic pleiotropy creates a network of highly inter-related HLA haplotypes that account for 97% of the census sample. We suggest this network has properties of a quasi-species and is itself under selection. We also suggest this is the origin of balancing selection in the HLA system.
academic

Infectious diseases, imposing density-dependent mortality on MHC/HLA variation, can account for balancing selection and MHC/HLA polymorphism

Basic Information

  • Paper ID: 2501.00767
  • Title: Infectious diseases, imposing density-dependent mortality on MHC/HLA variation, can account for balancing selection and MHC/HLA polymorphism
  • Author: D. P. L. Green
  • Classification: q-bio.PE (Population and Evolution), q-bio.MN (Molecular Networks)
  • Publication Date: 31 December 2024
  • Paper Link: https://arxiv.org/abs/2501.00767

Abstract

The human major histocompatibility complex (MHC) transplantation loci (HLA-A, -B, -C, -DPB1, -DQB1, -DRB1) represent the most polymorphic regions in the human genome. This polymorphism is widely believed to reflect their role in presenting pathogen-derived peptides to the adaptive immune system. Current proposed mechanisms for polymorphism, such as negative frequency-dependent selection (NFDS) and heterozygote advantage (HA), primarily focus on HLA alleles rather than haplotypes. This study proposes a novel model suggesting that infectious diseases impose independent density-dependent regulation on HLA haplotypes. Specifically, the complex pathogenic environment drives extensive host polymorphism through specialized HLA haplotype populations exhibiting incomplete peptide recognition. Segregation between haplotype populations is maintained through limiting similarity. The result is the production of extensive and stable haplotype density ranges at steady state, where effective Fisher fitness equals zero. Density and frequency theoretically become alternative measures of fitness. Analysis based on the National Marrow Donor Program (NMDP) Caucasian HLA dataset supports this model, providing evidence for positive selection of 350-2000 top 5-locus HLA haplotypes. High-fitness haplotypes drive selection of 137 high-frequency alleles. These alleles exhibit positive epistasis and pleiotropy in haplotype formation, creating a highly interconnected HLA haplotype network accounting for 97% of the sample total.

Research Background and Motivation

Problem Definition

The extreme polymorphism of the HLA system represents a classic problem in evolutionary biology. Existing theories primarily include:

  1. Negative Frequency-Dependent Selection (NFDS): Rare alleles possess selective advantage
  2. Heterozygote Advantage (HA): Heterozygous individuals exhibit higher fitness
  3. Environmental Variation: Spatiotemporal environmental changes maintain polymorphism

Limitations of Existing Approaches

  1. Focus on alleles rather than haplotypes: Overlooks selective pressures at the haplotype level
  2. Lack of population biological foundation: Fails to consider density-dependent effects
  3. Theory-data mismatch: Difficulty explaining observed frequency distribution patterns
  4. Neglect of epidemiological characteristics of pathogen transmission: Fails to account for density-dependent disease spread

Research Motivation

The author proposes the need for a new theoretical framework to explain:

  • Heavy-tailed frequency distributions of HLA alleles and haplotypes
  • Positive linkage disequilibrium in high-frequency haplotypes
  • Excess homozygosity of common 5-locus haplotypes
  • Long-term maintenance of polymorphism across species

Core Contributions

  1. Proposes density-dependent regulation model: Identifies infectious diseases as the source of density-dependent mortality for HLA haplotypes
  2. Establishes haplotype selection theory: Demonstrates that selection primarily operates at the haplotype rather than allele level
  3. Discovers HLA network structure: Identifies a highly interconnected network composed of 137 core alleles
  4. Provides empirical support: Validates theoretical predictions using large-scale NMDP dataset
  5. Redefines balancing selection: Interprets balancing selection as a consequence of density-dependent regulation

Detailed Methodology

Theoretical Framework

Density-Dependent Regulation Model

Based on the Verhulst logistic equation and Anderson-May disease transmission equation:

dN/dt = rN - αN²

Where:

  • N: haplotype density
  • r: intrinsic growth rate (Fisher fitness)
  • α: density-dependent mortality coefficient

Steady-State Conditions

At steady state (dN/dt = 0):

r = αN*

For multiple haplotypes coexisting:

α₁N₁* = α₂N₂* = ... = αᵢNᵢ* = r̃

Neher-Shraiman Model Extension

Decomposes haplotype fitness as:

F = E + A

Where E represents the epistatic component and A represents the additive component.

Data Analysis Methods

Log-Log Rank-Frequency Analysis

Using the Belevitch linguistic analysis method:

log(pᵢ/p₀) = -A log(i/i₀)

Network Analysis

  • Identification of pleiotropy patterns in high-frequency alleles
  • Analysis of connectivity between haplotypes
  • Quantification of epistatic effects

Experimental Setup

Dataset

  • NMDP Caucasian dataset: Contains 5-locus HLA haplotype data from ~10⁵ individuals
  • Covered loci: HLA-A, -B, -C, -DRB1, -DQB1
  • Sample size: 85,000 haplotypes with frequencies spanning 6 orders of magnitude

Analysis Metrics

  • Rank-frequency distribution patterns
  • Linkage disequilibrium (D')
  • Shannon entropy
  • Epistatic effect magnitude

Comparative Analysis

  • Observed vs. expected frequencies (based on allele frequency products)
  • Linkage disequilibrium patterns in high-frequency vs. low-frequency haplotypes
  • Distribution of core alleles vs. rare alleles

Experimental Results

Major Findings

1. Bimodal Haplotype Distribution

  • Selected population: 350-2000 high-frequency haplotypes (80% of sample)
  • Neutral/negatively selected population: Low-frequency haplotypes (20% of sample)
  • Transition point: Approximately rank 1730

2. Core Allele Network

Identified 137 core alleles:

  • HLA-A: 30 alleles (cumulative frequency 99.7%)
  • HLA-B: 40 alleles (cumulative frequency 98.6%)
  • HLA-C: 20 alleles (cumulative frequency 99.6%)
  • HLA-DQB1: 15 alleles (cumulative frequency 99.9%)
  • HLA-DRB1: 31 alleles (cumulative frequency 99.3%)

3. Power-Law Distribution Characteristics

High-frequency haplotypes follow a power-law distribution:

y = 0.0506x^(-0.822)

4. Evidence of Positive Epistasis

  • Observed haplotype frequencies span 5 orders of magnitude
  • Expected frequencies (based on allele frequency products) span only 1 order of magnitude
  • Epistatic component accounts for 9-12% of high-frequency allele frequencies

Supertypic Analysis

Mapping core alleles to HLA supertypes:

  • Ancient core alleles (potentially derived from Neanderthals/Denisovans) dominate high-frequency alleles
  • Competitive exclusion patterns within supertypes
  • Limiting similarity patterns between supertypes

Long-Range Linkage Analysis

HLA-B~DRB1 pairing analysis reveals:

  • Approximately 250 of 1240 possible pairings (20%) are under selective pressure
  • Two major pairings: B08:01g/DRB103:01g (8%) and B07:02/DRB115:01 (6.9%)

Theoretical Significance and Discussion

Reassessment of Balancing Selection Theory

Limitations of Traditional Theories

  1. Heterozygote advantage: Requires all alleles to possess approximately equal fitness, inconsistent with observations
  2. Negative frequency-dependent selection: Predicts allele replacement rates too rapid to explain trans-species polymorphism
  3. Neglect of epidemiology: Fails to consider density-dependent characteristics of disease transmission

Advantages of the New Model

  1. Density-dependent regulation: Automatically produces zero effective fitness, enabling stable coexistence
  2. Haplotype selection: Better explains observed linkage disequilibrium patterns
  3. Network effects: Explains allele pleiotropy and haplotype interconnectedness

Evolutionary Strategy Analogies

"Enigma Machine" Analogy

The HLA system resembles the World War II Enigma cipher machine:

  • Multi-rotor design: Multi-locus haplotypes increase difficulty of pathogen "decryption"
  • Distributed settings: Polymorphism limits the impact of pathogen breakthroughs
  • Broad low-affinity recognition: Contrasts with antibody high-affinity strategy

Red Queen Dynamics

  • Pathogens: Rapid reproduction, high mutation rates
  • Host defense: Relatively stable presentation system + rapid T cell expansion response
  • Equilibrium point: Achieved through haplotype network-mediated defense diversification

Quasispecies Characteristics

The HLA network exhibits quasispecies characteristics:

  • High interconnectedness: Connected through allele pleiotropy
  • Network selection: Entire network rather than individual haplotypes under selective pressure
  • Stability: Maintained through density-dependent regulation

Limitations and Future Directions

Current Limitations

  1. Epistatic mechanisms unclear: Specific molecular mechanisms of positive epistasis require further clarification
  2. Temporal scale issues: Time scales for network stability require validation with historical data
  3. Population specificity: Primarily based on Caucasian population data; extension to other populations needed
  4. Lack of pathogen data: Systematic pathogen-HLA interaction matrices unavailable

Future Research Directions

  1. Functional validation: Investigate epistatic mechanisms using AlphaFold and other structural prediction tools
  2. Cross-population comparison: Analyze functional overlap of HLA networks across populations
  3. Historical stability: Validate network frequency stability using ancient DNA data
  4. Infection matrices: Construct pathogen peptide-HLA haplotype recognition matrices

In-Depth Evaluation

Strengths

  1. Theoretical innovation: First introduction of density-dependent regulation to explain HLA polymorphism
  2. Substantial data support: Systematic analysis based on large-scale NMDP dataset
  3. Interdisciplinary integration: Successfully integrates population biology, epidemiology, and immunogenetics
  4. Strong explanatory power: Unified explanation of multiple long-standing observations

Weaknesses

  1. Mechanistic details: Molecular basis of epistasis still requires experimental validation
  2. Model simplification: Complexity of pathogenic environment may be oversimplified
  3. Predictive capacity: Model's ability to predict future evolutionary dynamics remains to be verified

Impact Assessment

This research may have significant implications for:

  1. Evolutionary immunology: Redefines theoretical framework for MHC evolution
  2. Personalized medicine: Provides new perspectives for HLA-based disease susceptibility prediction
  3. Vaccine design: Guides vaccine development strategies considering population HLA diversity

Applicable Scenarios

  • Evolutionary analysis of MHC/HLA polymorphism
  • Population immunogenetic studies
  • Infectious disease epidemiological modeling
  • Personalized immunotherapy design

Conclusion

This study proposes an innovative theoretical framework interpreting HLA polymorphism as resulting from density-dependent selection imposed by pathogens. Through large-scale data analysis, the author discovers that the HLA system forms a highly interconnected network composed of 137 core alleles, maintaining stable polymorphism through positive epistasis and pleiotropy. This finding not only provides new perspectives for understanding MHC evolution but also establishes a theoretical foundation for related medical applications.