Latent Retrieval Augmented Generation of Cross-Domain Protein Binders
Zhang, Kong, Huang et al.
Designing protein binders targeting specific sites, which requires to generate realistic and functional interaction patterns, is a fundamental challenge in drug discovery. Current structure-based generative models are limited in generating nterfaces with sufficient rationality and interpretability. In this paper, we propose Retrieval-Augmented Diffusion for Aligned interface (RADiAnce), a new framework that leverages known interfaces to guide the design of novel binders. By unifying retrieval and generation in a shared contrastive latent space, our model efficiently identifies relevant interfaces for a given binding site and seamlessly integrates them through a conditional latent diffusion generator, enabling cross-domain interface transfer. Extensive exeriments show that RADiAnce significantly outperforms baseline models across multiple metrics, including binding affinity and recovery of geometries and interactions. Additional experimental results validate cross-domain generalization, demonstrating that retrieving interfaces from diverse domains, such as peptides, antibodies, and protein fragments, enhances the generation performance of binders for other domains. Our work establishes a new paradigm for protein binder design that successfully bridges retrieval-based knowledge and generative AI, opening new possibilities for drug discovery.
academic
Latent Retrieval Augmented Generation of Cross-Domain Protein Binders
Designing protein binders targeting specific binding sites is a fundamental challenge in drug discovery, requiring the generation of realistic and functional interaction patterns. Current structure-based generative models have limitations in generating interfaces with sufficient plausibility and interpretability. This paper proposes RADiAnce (Retrieval Augmented Diffusion Aligned Interface), which guides the design of novel binders by leveraging known interfaces. By unifying retrieval and generation in a shared contrastive latent space, the model efficiently identifies relevant interfaces for a given binding site and seamlessly integrates them through a conditional latent diffusion generator, enabling cross-domain interface transfer.
Protein Binder Design Challenge: Designing binders that target specific protein sites requires generating realistic and functional molecular interface interaction patterns
Limitations of Existing Methods: Current structure generation models lack plausibility and interpretability, failing to effectively utilize known structural information
Neglect of Prior Knowledge: Most methods generate based solely on target binding sites, ignoring the abundant reusable interaction patterns in existing protein complexes
Lack of Cross-Domain Generalization: Inability to effectively leverage common interaction motifs across different types of binders (e.g., peptides, antibodies, protein fragments)
Insufficient Interpretability: The generation process lacks explicit biological guiding principles
Proposes RADiAnce Framework: The first method applying retrieval-augmented generation to protein binder sequence-structure co-design
Constructs Contrastive Latent Space: Designs a unified latent representation supporting both retrieval and generation, enabling cross-domain interface similarity measurement
Enables Cross-Domain Interface Transfer: Validates that retrieving interfaces from different binder types enhances generation performance for other domains
Significant Performance Improvement: Substantially outperforms baseline methods across multiple evaluation metrics, including binding affinity, geometry, and interaction recovery
Unified Latent Space: First to achieve unified retrieval and generation in the same latent space, ensuring retrieved results directly guide the generation process
Cross-Domain Similarity Measurement: Latent representations learned through contrastive learning capture common interaction motifs across different binder types
Conditional Diffusion Integration: Innovatively integrates retrieved interface embeddings into the diffusion process through cross-attention and residual MLPs
The paper cites 54 relevant references covering multiple domains including protein design, deep generative models, and retrieval-augmented generation, providing a solid theoretical foundation for the research.