2025-11-15T21:31:10.952177

MR.RGM: An R Package for Fitting Bayesian Multivariate Bidirectional Mendelian Randomization Networks

Sarkar, Ni
Motivation: Mendelian randomization (MR) infers causal relationships between exposures and outcomes using genetic variants as instrumental variables. Typically, MR considers only a pair of exposure and outcome at a time, limiting its capability of capturing the entire causal network. We overcome this limitation by developing 'MR.RGM' (Mendelian randomization via reciprocal graphical model), a fast R-package that implements the Bayesian reciprocal graphical model and enables practitioners to construct holistic causal networks with possibly cyclic/reciprocal causation and proper uncertainty quantifications, offering a comprehensive understanding of complex biological systems and their interconnections. We developed 'MR.RGM', an open-source R package that applies bidirectional MR using a network-based strategy, enabling the exploration of causal relationships among multiple variables in complex biological systems. 'MR.RGM' holds the promise of unveiling intricate interactions and advancing our understanding of genetic networks, disease risks, and phenotypic complexities.
academic

MR.RGM: An R Package for Fitting Bayesian Multivariate Bidirectional Mendelian Randomization Networks

Basic Information

  • Paper ID: 2403.03944
  • Title: MR.RGM: An R Package for Fitting Bayesian Multivariate Bidirectional Mendelian Randomization Networks
  • Authors: Bitan Sarkar, Yang Ni (Texas A&M University)
  • Classification: stat.AP (Applied Statistics)
  • Published Journal: Bioinformatics
  • Paper Link: https://arxiv.org/abs/2403.03944
  • Code Repository: https://github.com/bitansa/MR.RGM

Abstract

Mendelian randomization (MR) infers causal relationships between exposures and outcomes by using genetic variants as instrumental variables. Traditional MR methods consider only a single pair of exposure and outcome variables at a time, limiting their ability to capture entire causal networks. This paper develops 'MR.RGM' (Mendelian Randomization via Reciprocal Graphical Models), a fast R package implementing Bayesian reciprocal graphical models that enables researchers to construct holistic causal networks with possible cyclic/reciprocal causal relationships and provide appropriate uncertainty quantification, thereby enabling comprehensive understanding of complex biological systems and their interconnections.

Research Background and Motivation

Problem Definition

Traditional Mendelian randomization (MR) methods primarily focus on causal inference for single exposure-outcome pairs, with the following limitations:

  1. Neglect of Network Complexity: Inability to capture complex causal network structures among multiple variables
  2. Missing Bidirectional Causality: Difficulty handling reciprocal or cyclic causal relationships between variables
  3. Lack of Holistic Perspective: Inability to provide global causal understanding of biological systems

Research Significance

In complex biological systems, intricate interaction networks often exist among genes, proteins, and phenotypes. Understanding these networks is crucial for:

  • Disease risk assessment
  • Therapeutic target identification
  • Biological mechanism elucidation
  • Precision medicine development

Limitations of Existing Methods

Through comprehensive investigation of existing R packages (including mr.pivw, mr.raps, PPMR, OneSampleMR, MVMR, etc.), the authors found that all existing methods lack support for bidirectional MR analysis, which represents a critical deficiency in constructing complete causal networks.

Core Contributions

  1. First R Package Supporting Bidirectional MR: MR.RGM is the only multivariate MR package capable of handling bidirectional causal relationships
  2. Bayesian Network Framework: Implements uncertainty quantification and network structure inference based on reciprocal graphical models
  3. Multiple Data Input Formats: Supports individual-level data and two types of summary-level data formats
  4. Optimized Computational Efficiency: Uses C++ backend and Woodbury matrix identity to enhance computational efficiency
  5. Network Motif Analysis: Provides NetworkMotif function for uncertainty quantification of specific network structures

Methodology Details

Mathematical Model

Base Model

For response variables Yi=(Yi1,,Yip)TY_i = (Y_{i1}, \ldots, Y_{ip})^T and instrumental variables Xi=(Xi1,,Xik)TX_i = (X_{i1}, \ldots, X_{ik})^T, the model is defined as:

Yi=AYi+BXi+Ei,EiN(0,Σ)Y_i = AY_i + BX_i + E_i, \quad E_i \sim N(0, \Sigma)

where:

  • ARp×pA \in \mathbb{R}^{p \times p}: Causal effect matrix among response variables (diagonal elements are zero)
  • BRp×kB \in \mathbb{R}^{p \times k}: Effect matrix of instrumental variables on response variables
  • Σ=diag(σ1,,σp)\Sigma = \text{diag}(\sigma_1, \ldots, \sigma_p): Error covariance matrix

Equivalent Form

The model can be rewritten as: YiNp{(IpA)1BXi,(IpA)1Σ(IpA)T}Y_i \sim N_p\{(I_p - A)^{-1}BX_i, (I_p - A)^{-1}\Sigma(I_p - A)^{-T}\}

Prior Specification

Spike and Slab Prior

For elements of matrix AA: aijγijN(0,τij)+(1γij)N(0,ν1×τij)a_{ij} \sim \gamma_{ij}N(0, \tau_{ij}) + (1-\gamma_{ij})N(0, \nu_1 \times \tau_{ij})γijBer(ρij),ρijBeta(aρ,bρ)\gamma_{ij} \sim \text{Ber}(\rho_{ij}), \quad \rho_{ij} \sim \text{Beta}(a_\rho, b_\rho)

Threshold Prior

a~ijN(0,τij),aij=a~ijI(a~ij>tA)\tilde{a}_{ij} \sim N(0, \tau_{ij}), \quad a_{ij} = \tilde{a}_{ij}I(|\tilde{a}_{ij}| > t_A)

MCMC Inference

Posterior inference is conducted using a hybrid strategy of Metropolis-Hastings algorithm and Gibbs sampling, including:

  1. Marginal probability updates (Gibbs)
  2. Effect coefficient updates (M-H)
  3. Variance parameter updates (Gibbs)
  4. Threshold parameter updates (M-H, Threshold prior only)

Computational Optimization

Woodbury Matrix Identity

To enhance computational efficiency, the Woodbury identity is employed for computing determinants and matrix inverses:

det(IpA)=(1+(IpA)(j,i)1×(aijaij))det(IpA)\det(I_p - A^*) = (1 + (I_p - A)^{-1}_{(j,i)} \times (a_{ij} - a^*_{ij})) \det(I_p - A)

(IpA)1=(IpA)1aijaij1+(aijaij)(IpA)(j,i)1(IpA)(,i)1×(IpA)(j,)1(I_p - A^*)^{-1} = (I_p - A)^{-1} - \frac{a_{ij} - a^*_{ij}}{1 + (a_{ij} - a^*_{ij})(I_p - A)^{-1}_{(j,i)}} (I_p - A)^{-1}_{(\cdot,i)} \times (I_p - A)^{-1}_{(j,\cdot)}

Software Implementation

Core Functions

RGM Function

  • Input Formats:
    • Individual-level data: X (instrumental variable matrix), Y (response variable matrix)
    • Summary data type 1: Syy, Syx, Sxx covariance matrices
    • Summary data type 2: Sxx, Beta, SigmaHat matrices
  • Required Parameters: D (binary indicator matrix), n (sample size)
  • Output: Causal effect estimates, network structure, posterior probabilities, etc.

NetworkMotif Function

  • Functionality: Uncertainty quantification for specific network motifs
  • Input: Target network structure Gamma, posterior samples GammaPst
  • Output: Posterior probability

Identifiability Conditions

To ensure model identifiability, each response variable must have at least one unique instrumental variable, i.e., each row of the D matrix must contain at least one unique 1.

Experimental Setup

Simulation Design

  • Model: Y=AY+BX+EY = AY + BX + E
  • Sample Sizes: 10k, 30k, 50k
  • Network Scales: 5, 10 nodes
  • Sparsity: 25%, 50%
  • Effect Sizes: ±0.1
  • Variance Explained: 1%, 3%, 5%, 10%

Evaluation Metrics

  • TPR (True Positive Rate)
  • FPR (False Positive Rate)
  • FDR (False Discovery Rate)
  • MCC (Matthews Correlation Coefficient)
  • AUC (Area Under ROC Curve)

Comparison Methods

Primarily compared with the OneSampleMR package, which is the most recent advanced MR tool.

Experimental Results

Main Results

Network Structure Recovery Performance

MR.RGM significantly outperforms OneSampleMR under all tested conditions:

Network Scale 5, Sparsity 50%:

  • Spike & Slab Prior: AUC = 0.77-0.99, TPR = 0.50-0.99
  • OneSampleMR: AUC = 0.56-0.79, TPR = 0.08-0.84

Network Scale 10, Sparsity 25%:

  • Spike & Slab Prior: AUC = 0.87-0.995, TPR = 0.69-0.99
  • OneSampleMR: AUC = 0.48-0.52, TPR = 0.07-0.39

Computational Efficiency

  • Good Scalability: Sublinear growth with respect to number of nodes and instrumental variables
  • Practical Runtime: Analysis of 15 genes with 31 SNPs requires only 32.329 seconds on Apple M2 Pro

Robustness Analysis

Sensitivity tests to different error distributions show that MR.RGM is robust to normality assumptions:

  • Normal distribution: TPR=0.86, FPR=0.0133, MAD=0.0169
  • t-distribution (df=3): TPR=0.86, FPR=0.0200, MAD=0.0153
  • Laplace distribution: TPR=0.87, FPR=0.0333, MAD=0.0164

Real Data Application

Application on GTEx V7 dataset (332 samples, 15 genes) successfully constructed gene regulatory networks, demonstrating the practical utility of the method.

Classification of Existing MR Methods

  1. Univariate Methods: mr.pivw, OneSampleMR
  2. Multivariate Methods: MVMR, MRPC, MendelianRandomization
  3. Bayesian Methods: mrbayes, MrDAG
  4. Network Methods: MrDAG (DAG-only support)

Advantages of This Work

MR.RGM is the only tool supporting the following combination of features:

  • Multivariate analysis
  • Bidirectional causal relationships
  • Uncertainty quantification
  • Multiple data format support

Conclusions and Discussion

Main Conclusions

  1. MR.RGM successfully fills the gap in bidirectional MR analysis
  2. The Bayesian framework provides effective uncertainty quantification
  3. The method performs excellently on both simulated and real data
  4. Computational efficiency meets practical application requirements

Limitations

  1. Normality Assumption: Although robustness tests show insensitivity, the method theoretically depends on normality
  2. Identifiability Requirements: Each response variable requires a unique instrumental variable
  3. Large-Scale Networks: Computational efficiency for very large networks requires further optimization

Future Directions

  1. Extension to nonlinear causal relationships
  2. Handling potential confounding factors
  3. Integration of multi-omics data
  4. Development of graphical user interface

In-Depth Evaluation

Strengths

  1. Strong Innovation: First implementation of bidirectional MR analysis, filling an important gap
  2. Rigorous Methodology: Solid theoretical foundation of Bayesian framework with correct MCMC implementation
  3. High Practicality: Supports multiple data formats, meeting diverse application scenarios
  4. Comprehensive Validation: Thorough simulation studies and real data verification
  5. Software Quality: Open-source code, detailed documentation, user-friendly

Weaknesses

  1. Limited Theoretical Analysis: Lacks theoretical guarantees for convergence and identifiability
  2. Limited Comparative Experiments: Primarily compared with OneSampleMR, lacking comparison with other network methods
  3. Single Application Case: Only demonstrates gene expression data application, lacking other biological applications

Impact

  1. Academic Value: Provides important tools for causal inference research
  2. Practical Value: Broad application prospects in genetics and epidemiological research
  3. Reproducibility: Open-source code ensures reproducible results

Applicable Scenarios

  1. Genetic Research: Gene regulatory network construction
  2. Epidemiology: Disease risk factor network analysis
  3. Systems Biology: Multi-omics data integration analysis
  4. Precision Medicine: Individualized therapeutic target identification

References

  1. Ni, Y., Ji, Y., & Müller, P. (2018). Reciprocal graphical models for integrative gene regulatory network analysis.
  2. GTEx Consortium. (2020). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science, 369(6509), 1318-1330.
  3. Palmer, T., Spiller, W., & Sanderson, E. (2023). OneSampleMR: One Sample Mendelian Randomization and Instrumental Variable Analyses.

Overall Assessment: This is a high-quality methodological paper that successfully addresses the important problem of multivariate bidirectional Mendelian randomization. The software implementation is comprehensive, validation is thorough, and it has significant value for causal inference and genetic research. While there is room for improvement in theoretical analysis and application scope, the overall contribution is substantial and worthy of recommendation.