2025-11-11T14:46:09.738382

Hierarchical Bayesian Flow Networks for Molecular Graph Generation

Xiong, Chen, Li et al.
Molecular graph generation is essentially a classification generation problem, aimed at predicting categories of atoms and bonds. Currently, prevailing paradigms such as continuous diffusion models are trained to predict continuous numerical values, treating the training process as a regression task. However, the final generation necessitates a rounding step to convert these predictions back into discrete classification categories, which is intrinsically a classification operation. Given that the rounding operation is not incorporated during training, there exists a significant discrepancy between the model's training objective and its inference procedure. As a consequence, an excessive emphasis on point-wise precision can lead to overfitting and inefficient learning. This occurs because considerable efforts are devoted to capturing intra-bin variations that are ultimately irrelevant to the discrete nature of the task at hand. Such a flaw results in diminished molecular diversity and constrains the model's generalization capabilities. To address this fundamental limitation, we propose GraphBFN, a novel hierarchical coarse-to-fine framework based on Bayesian Flow Networks that operates on the parameters of distributions. By innovatively introducing Cumulative Distribution Function, GraphBFN is capable of calculating the probability of selecting the correct category, thereby unifying the training objective with the sampling rounding operation. We demonstrate that our method achieves superior performance and faster generation, setting new state-of-the-art results on the QM9 and ZINC250k molecular graph generation benchmarks.
academic

Hierarchical Bayesian Flow Networks for Molecular Graph Generation

Basic Information

  • Paper ID: 2510.10211
  • Title: Hierarchical Bayesian Flow Networks for Molecular Graph Generation
  • Authors: Yida Xiong, Jiameng Chen, Kun Li, Hongzhi Zhang, Xiantao Cai, Wenbin Hu (School of Computer Science, Wuhan University)
  • Classification: cs.LG (Machine Learning)
  • Publication Date: October 11, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.10211

Abstract

Molecular graph generation is inherently a categorical generation problem aimed at predicting atom and chemical bond categories. Current mainstream continuous diffusion models treat the training process as a regression task, predicting continuous values, but require rounding operations to convert to discrete categorical classes during final generation. Since the rounding operation is not included during training, there exists a significant discrepancy between the model's training objective and inference process, leading to overfitting, low learning efficiency, and reduced molecular diversity. To address this fundamental limitation, the authors propose GraphBFN, a hierarchical coarse-to-fine framework based on Bayesian Flow Networks, which innovatively introduces cumulative distribution functions to calculate the probability of selecting the correct category, thereby unifying the training objective with sampling rounding operations.

Research Background and Motivation

Core Problem

There exists a fundamental train-inference inconsistency problem in molecular graph generation:

  1. Training Phase: Continuous diffusion models map discrete atom/bond categories to continuous space, optimizing continuous value predictions through regression loss
  2. Inference Phase: Requires hard rounding to convert continuous predictions back to discrete categories
  3. Inconsistency: Training does not account for rounding rules, causing models to focus excessively on intra-class variations while neglecting the discrete nature

Problem Significance

  • Molecular graph generation is a key technology in drug discovery, impacting molecular optimization, drug-target binding affinity prediction, and other downstream tasks
  • The inconsistency in existing methods leads to reduced molecular diversity and limited generalization capability
  • Even minor regression errors can result in completely incorrect classification outcomes

Limitations of Existing Methods

  1. Discrete Diffusion Models: While suitable for discrete graph structures, they sacrifice the smoothness and dynamic generation characteristics of continuous representations
  2. Continuous Diffusion Models: Training objectives decouple from inference processes, prone to overfitting to irrelevant intra-class variations
  3. Traditional Bayesian Flow Networks: Assume all categories are equidistant in the probability simplex, leading to slow convergence and high noise

Core Contributions

  1. First application of Bayesian Flow Networks to molecular graph generation, enhancing generation effectiveness through hierarchical molecular representation supervision
  2. Innovative introduction of Cumulative Distribution Functions (CDF), calculating class probabilities rather than fitting specific values, unifying training objectives with sampling rounding operations
  3. Proposes hierarchical coarse-to-fine framework, capturing both local atomic connectivity and global molecular topology through multi-scale graph representations
  4. Achieves faster training and sampling, reaching state-of-the-art performance on QM9 and ZINC250k benchmarks with significantly reduced sampling steps

Methodology Details

Task Definition

Given a molecular graph G=(X,A)G = (X, A), where:

  • X{0,,KX1}DX \in \{0, \ldots, K_X - 1\}^D: DD atom feature matrices from KXK_X categories
  • A{0,,KA1}D×DA \in \{0, \ldots, K_A - 1\}^{D \times D}: Adjacency matrix containing KAK_A bond category features

The objective is to learn to generate new molecular graphs conforming to the real molecular distribution.

Model Architecture

1. Hierarchical Coarse-to-Fine Framework

  • Multi-scale Representation: Uses DiffPool to construct LL coarsening layers, generating pyramid representations of molecular graphs
  • Bottom-up Generation: Begins with unconditional generation from the coarsest layer, progressively refining to complete atomic graphs
  • Condition Propagation: Each layer's upsampling module ϕ1(l)\phi_1^{(l)} converts coarse layer outputs to fine layer conditions c(l)c^{(l)}

2. Graph Representation Mapping

Maps discrete categories k{0,,K1}k \in \{0, \ldots, K-1\} to continuous space [1,1][-1, 1]:

k_c = (2k + 1)/K - 1  # center point
k_l = k_c - 1/K       # left boundary  
k_r = k_c + 1/K       # right boundary

3. Bayesian Flow Network Components

Input Distribution: Modeled using Gaussian distribution

p_I(G|θ) = N(G|μ, ρ^{-1}I)

Sending Distribution: Adds Gaussian noise

p_S(Y|G; α) = N(Y|G, α^{-1}I)

Output Distribution: Computes discrete probabilities through CDF

p_O^{(d)}(k|θ; t) = F(k_r|μ_x^{(d)}, σ_x^{(d)}) - F(k_l|μ_x^{(d)}, σ_x^{(d)})

Receiving Distribution:

p_R(Y|θ; t, α) = ∏_{d=1}^D ∑_{k=0}^{K-1} p_O^{(d)}(k|θ; t)N(Y^{(d)}|k_c, α^{-1})

4. Key Innovation: CDF Mechanism

Uses truncated cumulative distribution function to connect continuous distributions with discrete categories:

F(x|μ_x^{(d)}, σ_x^{(d)}) = {
  0,                    if x ≤ -1
  1,                    if x ≥ 1  
  1/2[1 + erf((x-μ_x^{(d)})/(√2σ_x^{(d)}))], otherwise
}

Technical Innovations

  1. Train-Inference Consistency: CDF directly computes discrete probabilities, avoiding mismatches between continuous prediction and discrete rounding
  2. Non-equidistant Category Mapping: Unlike traditional BFN assuming equidistant categories, allows faster and smoother convergence
  3. Multi-scale Supervision: Hierarchical framework provides structural information at different granularities, enhancing generation quality
  4. End-to-end Optimization: Unified loss function simultaneously optimizes BFN generation loss and pooling loss

Experimental Setup

Datasets

  • QM9: Quantum chemistry dataset containing 134k small molecules
  • ZINC250k: Drug-like molecule dataset containing 250k relatively larger molecules

Evaluation Metrics

  • Validity w/o correction: Proportion of valid molecules without correction
  • Uniqueness: Proportion of unique molecules among generated samples
  • FCD (Fréchet ChemNet Distance): Distance between training and generated sets using ChemNet features
  • NSPDK MMD: Maximum mean discrepancy of neighborhood subgraph pairwise distance kernels considering atom and bond features

Baseline Methods

Include multiple state-of-the-art baselines:

  • Flow Models: MoFlow
  • Diffusion Models: EDP-GNN, GDSS, DiGress, GSDM
  • Flow Matching: Dirichlet FM, CatFlow
  • Energy Models: GraphEBM

Implementation Details

  • Sampling steps: GraphBFN uses 100×L steps (L is the number of layers), significantly fewer than baseline's 400-1000 steps
  • Multi-scale loss balance parameters: λ₁, λ₂
  • Minimum time threshold: t_min = 10⁻⁵

Experimental Results

Main Results

MethodQM9 Val.↑QM9 Unique↑QM9 FCD↓QM9 NSPDK↓ZINC250k Val.↑ZINC250k Unique↑ZINC250k FCD↓ZINC250k NSPDK↓Sampling Steps
GDSS95.7298.462.5650.003397.1299.6414.0320.01921000
CatFlow99.8199.950.4410.002999.21100.0013.2110.0207-
GraphBFN99.6099.970.2140.000896.00100.005.7430.0069100×L

Key Findings:

  • 51.5% improvement in FCD metric, 72.4% improvement in NSPDK metric
  • Achieves best performance with significantly fewer sampling steps
  • Highest uniqueness, demonstrating excellent diversity

Ablation Studies

GraphBFN vs GraphBFN_w/o (without hierarchical supervision):

  • Hierarchical framework improves all metrics
  • While sacrificing some sampling speed, significantly improves generation quality

Sampling Efficiency Analysis

  • Excellent performance within first 50 steps
  • Compared to baseline methods requiring 400-1000 steps, GraphBFN achieves superior results with only 100 steps
  • Suitable for inference time-sensitive applications

Molecular Graph Generation Models

  • Autoregressive Models: Sequentially add nodes and edges, such as GraphRNN series
  • One-shot Models: Methods based on VAE, normalizing flows, GANs, but often face mode collapse issues
  • Diffusion Models: Recent mainstream direction, divided into discrete and continuous categories

Graph Diffusion Models

  • Discrete Diffusion: Directly defines diffusion processes in discrete state space, such as DiGress
  • Continuous Diffusion: Maps to continuous space applying Gaussian diffusion, such as GDSS, GSDM
  • Core Challenges: How to handle the discrete nature of atom and bond labels

Bayesian Flow Networks

  • Novel generative models learning mappings between distributions
  • Create continuous differentiable training processes for discrete data
  • This work proposes simpler and more effective mechanisms for handling discrete features

Conclusions and Discussion

Main Conclusions

  1. Successfully resolves train-inference inconsistency: Unifies continuous training with discrete sampling through CDF mechanism
  2. Significantly improves generation quality: Achieves state-of-the-art performance on standard benchmarks
  3. Substantially increases sampling efficiency: Reduces sampling steps to 1/4-1/10 of baseline methods
  4. Enhances molecular diversity: Avoids overfitting to irrelevant intra-class variations

Limitations

  1. Insufficient Interpretability Analysis: Lacks in-depth analysis of how multi-scale information optimizes generation results
  2. Limited Applicability Scope: Primarily validated on relatively small molecular datasets
  3. Computational Complexity: Hierarchical framework introduces certain computational overhead

Future Directions

  1. Extension to larger and more complex graph domains
  2. Exploration of conditional generation task applications
  3. Enhancement of interpretability analysis
  4. Optimization of computational efficiency

In-depth Evaluation

Strengths

  1. Significant Theoretical Contribution: Identifies and addresses fundamental problems in continuous diffusion models
  2. Outstanding Technical Innovation: CDF mechanism cleverly bridges continuous training and discrete inference
  3. Comprehensive Experimental Validation: Thorough comparative experiments and ablation studies
  4. High Practical Value: Significant efficiency improvements suitable for real-world applications

Weaknesses

  1. Limited Theoretical Analysis Depth: Insufficient analysis of convergence properties and theoretical guarantees
  2. Limited Experimental Scale: Primarily validated on medium-scale datasets, lacking large-scale validation
  3. Computational Overhead: Insufficient analysis of additional computational costs from hierarchical framework
  4. Insufficient Hyperparameter Sensitivity Analysis: Lacks detailed sensitivity analysis for key hyperparameters

Impact

  1. Academic Contribution: Provides new solution approaches for discrete generation tasks
  2. Practical Value: Can accelerate drug discovery pipelines
  3. Reproducibility: Clear method description facilitates reproduction
  4. Generalization Potential: Framework extensible to other discrete structure generation tasks

Applicable Scenarios

  1. Drug Discovery: Molecular design and optimization
  2. Materials Science: Novel material structure generation
  3. Cheminformatics: Compound library expansion
  4. Other Discrete Structure Generation: Such as protein and DNA sequences

References

The paper cites important works in the field, including:

  • Graves et al. (2023): Original work on Bayesian Flow Networks
  • Vignac et al. (2023): DiGress discrete diffusion method
  • Jo, Lee, and Hwang (2022): GDSS score-based diffusion model
  • Ying et al. (2018): DiffPool hierarchical graph pooling method

Overall Assessment: This is a high-quality research paper that successfully identifies and addresses core problems in molecular graph generation. Through innovative CDF mechanisms and hierarchical frameworks, it significantly improves practical performance while maintaining theoretical rigor. Although there is room for improvement in theoretical analysis depth and experimental scale, its contributions are sufficient to advance the field.