2025-11-11T08:22:09.471735

FP-AbDiff: Improving Score-based Antibody Design by Capturing Nonequilibrium Dynamics through the Underlying Fokker-Planck Equation

Chen, Xiong, Li et al.
Computational antibody design holds immense promise for therapeutic discovery, yet existing generative models are fundamentally limited by two core challenges: (i) a lack of dynamical consistency, which yields physically implausible structures, and (ii) poor generalization due to data scarcity and structural bias. We introduce FP-AbDiff, the first antibody generator to enforce Fokker-Planck Equation (FPE) physics along the entire generative trajectory. Our method minimizes a novel FPE residual loss over the mixed manifold of CDR geometries (R^3 x SO(3)), compelling locally-learned denoising scores to assemble into a globally coherent probability flow. This physics-informed regularizer is synergistically integrated with deep biological priors within a state-of-the-art SE(3)-equivariant diffusion framework. Rigorous evaluation on the RAbD benchmark confirms that FP-AbDiff establishes a new state-of-the-art. In de novo CDR-H3 design, it achieves a mean Root Mean Square Deviation of 0.99 Å when superposing on the variable region, a 25% improvement over the previous state-of-the-art model, AbX, and the highest reported Contact Amino Acid Recovery of 39.91%. This superiority is underscored in the more challenging six-CDR co-design task, where our model delivers consistently superior geometric precision, cutting the average full-chain Root Mean Square Deviation by ~15%, and crucially, achieves the highest full-chain Amino Acid Recovery on the functionally dominant CDR-H3 loop (45.67%). By aligning generative dynamics with physical laws, FP-AbDiff enhances robustness and generalizability, establishing a principled approach for physically faithful and functionally viable antibody design.
academic

FP-AbDiff: Improving Score-based Antibody Design by Capturing Nonequilibrium Dynamics through the Underlying Fokker-Planck Equation

Basic Information

  • Paper ID: 2511.03113
  • Title: FP-AbDiff: Improving Score-based Antibody Design by Capturing Nonequilibrium Dynamics through the Underlying Fokker-Planck Equation
  • Authors: Jiameng Chen, Yida Xiong, Kun Li, Hongzhi Zhang, Xiantao Cai, Wenbin Hu, Jia Wu
  • Classification: cs.LG cs.AI q-bio.QM
  • Publication Date: November 5, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2511.03113

Abstract

Computational antibody design holds tremendous potential for therapeutic discovery, yet existing generative models are fundamentally limited by two core challenges: (i) lack of kinetic consistency, leading to physically unrealistic structures; (ii) poor generalization due to data scarcity and structural bias. This paper introduces FP-AbDiff, the first antibody generator that enforces the Fokker-Planck equation (FPE) physical law throughout the entire generation trajectory. The method minimizes a novel FPE residual loss on the hybrid manifold (R³×SO(3)) of CDR geometry, forcing locally learned denoising scores to assemble into a globally consistent probability flow. This physics-informed regularizer is synergistically integrated with deep biological priors within a state-of-the-art SE(3)-equivariant diffusion framework.

Research Background and Motivation

Problem Definition

Antibody design faces two critical challenges:

  1. Lack of Kinetic Consistency: Existing diffusion models such as DiffAb, AbDiffuser, and AbX optimize structures at independent noise levels without constraining the paths connecting them. Their denoising score matching (DSM) objectives capture local gradients but ignore global transitions, frequently producing chemically unrealistic loop rearrangements, unstable side-chain packing, and energetically strained conformations.
  2. Insufficient Generalization Capability: Diffusion generators perform poorly outside the narrow scope of current datasets, limiting their practical application value. The primary benchmark SAbDab contains fewer than 5,000 non-redundant complexes and is severely biased toward a small number of human IgG scaffolds bound to viral epitopes.

Research Motivation

CDR specificity and affinity arise from subtle, continuous conformational motions rather than isolated structural snapshots. Existing methods lack explicit mechanisms to enforce temporal consistency and frequently revert to familiar patterns when faced with out-of-distribution (OOD) tasks.

Core Contributions

  1. First FPE Regularization Framework: FP-AbDiff introduces the first diffusion framework for CDRs that enforces score-Fokker-Planck consistency on R³×SO(3), ensuring globally consistent probability flows and eliminating non-physical loop transitions.
  2. Unification of Physical Laws and Biological Priors: Integrates Fokker-Planck physics with evolutionary, geometric, and energetic priors into a single objective, achieving kinetically consistent and generalizable antibody generation.
  3. State-of-the-Art Performance Breakthrough: Achieves state-of-the-art performance on antibody design and optimization tasks, reaching 0.99 Å RMSDFv in CDR-H3 design (25% improvement over AbX) and 39.91% contact amino acid recovery rate.

Methodology Details

Task Definition

Antibody design is formulated as conditional CDR generation given structural context C (antigen and framework). CDRs are defined by their ground truth state at t=0, S₀=(A₀,X₀,R₀), comprising:

  • Amino acid sequence A₀
  • Heavy atom coordinates X₀∈R^(Dx)
  • Residue orientations R₀∈SO(3)^(NCDR)

Model Architecture

Stochastic Dynamics Modeling

Translational Dynamics (Euclidean Space): Backbone coordinates Xt∈R³ evolve through variance-preserving (VP) SDE:

dXt = -½βX(t)Xt dt + √βX(t) dWX,t

Rotational Dynamics (SO(3) Manifold): Each residue's orientation Ri,t∈SO(3) evolves through variance-exploding (VE) SDE:

dRi,t = √βR(t) Σ(Ri,tEa) ∘ dWᵃt

Fokker-Planck Equation Derivation

For a general SDE dx_t = f(x_t,t)dt + g(t)dW_t, the FPE describes the evolution of probability density p(x,t):

∂p/∂t = -∇·(fp) + ½g²(t)Δp

Euclidean Space Dynamics: The evolution operator GX is defined as:

GX[sX,X,t] := ½βX(t)[sX + (∇XsX)X + HX(sX)]

SO(3) Manifold Dynamics: The evolution operator GR is defined as:

GR[sR,R,t] := ½βR(t)[ΔBsR - 2sR + HR(sR)]

FPE Residual Regularization

Converts network-predicted clean CDRs into precise translational and rotational scores through indirect score inference:

Translational score:

sθ,X(Xt,t|Xθ₀) = -(Xt - αX(t)Xθ₀)/σ²X(t)

Rotational score:

sθ,R(Rt,t|Rθ₀) = ∇SO(3) log pIGSO(3)((Rθ₀)ᵀRt; σ²R(t))

FPE residual is defined as:

εX(Xt,t) := ∂tsθ,X(Xt,t|Xθ₀) - GX[sθ,X,Xt,t]
εR(Rt,t) := ∂tsθ,R(Rt,t|Rθ₀) - GR[sθ,R,Rt,t]

Training Objectives

Fidelity Loss:

Lfid = L^X_DSM + L^R_DSM + 0.4·LCE

Biophysical Plausibility Priors:

Lpriors = LFAPE + 0.5Ldist + 0.1LpLDDT + 0.03Lviol + 0.25Lbb

Kinetic Consistency Regularizer:

Lfpe(θ) = Et,St[w(t)(||εX||²/DX + ||εR||²/DR)]

Complete Loss Function:

Ltotal = Lfid + It<τLpriors + 0.05·Lfpe

Experimental Setup

Datasets

  • Training Set: Non-redundant set derived from SAbDab (September 2024), CDR-H3 sequence identity ≤40%
  • Test Set: 60 antibody-antigen complexes from RAbD benchmark

Evaluation Metrics

  • Sequence Recovery: AARFv, AARFull, CAAR (contact amino acid recovery rate)
  • Structural Accuracy: RMSDFv, RMSDFull, TM-score, lDDT
  • Functional Feasibility: IMP (percentage of samples with ∆∆G<0), DockQ

Comparison Methods

  • Diffusion Models: DiffAb, AbX
  • Energy-guided Pipelines: RosettaAb
  • Equivariant GNNs: dyMEAN, MEAN
  • Autoregressive Sequence Models: HERN

Experimental Results

Main Results

CDR-H3 Design Task

ModelAAR↑TMscore↑lDDT↑CAAR↑RMSD↓DockQ↑
AbX84.90%0.99060.940739.08%1.320.429
FP-AbDiff83.65%0.99290.936339.91%0.990.444

FP-AbDiff achieves 25% improvement in RMSDFv, reaching sub-angstrom accuracy of 0.99 Å, and obtains the highest CAAR of 39.91%.

Six-CDR Cooperative Design

In the more challenging full paratope design task, FP-AbDiff achieves the lowest RMSDFull across all six CDRs, reducing average geometric error by approximately 15% compared to AbX, and achieves the highest AARFull on the functionally critical CDR-H3 loop (45.67%).

Ablation Studies

Model VariantIMP(%)↑AAR(%)↑RMSD(Å)↓DockQ↑
+R³, +SO(3)28.4245.232.180.4443
-SO(3)35.3044.152.460.4437
-R³29.7643.142.410.4372

The complete model achieves the highest fidelity. Removing the R³ term degrades backbone and interface quality, while removing the SO(3) term increases IMP but worsens RMSD and AAR.

Antibody Optimization Experiments

In iterative denoising optimization, AbX follows a "high-gain but fragile" trajectory, while FP-AbDiff maintains consistently lower RMSD and higher DockQ from t=8 onwards, reflecting a more stable optimization path.

Traditional Methods

Early approaches such as RosettaAntibodyDesign rely on statistical energy functions and Monte Carlo sampling, but are limited by high computational costs and limited sampling efficiency.

Deep Learning Methods

  • Sequence-Centric Models: Protein language models treat proteins as text inputs but ignore spatial and geometric priors
  • Geometrically Equivariant Models: GNN models like MEAN and dyMEAN, and predictors like AlphaFold2
  • Diffusion Models: DiffAb, AbDiffuser, etc., but lacking temporal consistency

Advantages of This Work

FP-AbDiff is the first framework to impose physical self-consistency in antibody generation, addressing the kinetic consistency problem through Fokker-Planck regularization.

Conclusions and Discussion

Main Conclusions

FP-AbDiff consistently outperforms state-of-the-art baselines across all evaluation tasks in antibody design by enforcing Fokker-Planck physical laws, achieving high-fidelity structures, accurate interfaces, and stable generation trajectories.

Limitations

  1. Numerical Approximations: Implementation of FPE residuals relies on approximation methods such as finite differences and the Hutchinson trick
  2. Computational Overhead: While adding only 8% training time, it still requires additional forward passes
  3. Experimental Validation: Lacks wet-lab validation of the functionality of designed antibodies

Future Directions

  1. Improve numerical approximation methods for higher accuracy
  2. Extend to other protein design tasks
  3. Incorporate experimental feedback for model optimization
  4. Explore more complex physical constraints

In-Depth Evaluation

Strengths

  1. Theoretical Innovation: First to introduce the Fokker-Planck equation into antibody design, addressing the kinetic consistency problem
  2. Technical Advancement: Cleverly combines physical laws with deep learning, achieving consistency constraints on the hybrid manifold R³×SO(3)
  3. Comprehensive Experiments: Thorough baseline comparisons, ablation studies, and case analyses
  4. Outstanding Performance: Achieves SOTA on multiple metrics, particularly the 25% RMSD improvement is significant

Weaknesses

  1. Increased Complexity: The method is relatively complex with numerous implementation details
  2. Insufficient Theoretical Analysis: Lacks theoretical guarantees on the convergence of FPE regularization
  3. Limited Scope: Primarily focused on antibody design; generalization capability to other protein design tasks remains unknown

Impact

This work provides a new research paradigm for the intersection of computational biology and machine learning, combining physical laws with deep generative models, with significant implications for protein design, drug discovery, and related fields.

Applicable Scenarios

  • Therapeutic antibody design
  • Antibody engineering and optimization
  • Other molecular generation tasks requiring physical consistency
  • Structural biology research

References

The paper cites extensive related work, including:

  • Diffusion model foundations (Song & Ermon 2019; Ho et al. 2020)
  • Antibody design methods (Adolf-Bryfogle et al. 2018; Luo et al. 2022)
  • Geometric deep learning (Yim et al. 2023; Bortoli et al. 2022)
  • Fokker-Planck equation applications (Lai et al. 2023)

This paper makes important contributions to the field of computational antibody design, significantly enhancing the performance and reliability of generative models through the introduction of physical constraints, providing valuable new insights for future protein design research.