2025-11-15T21:31:10.952177

MR.RGM: An R Package for Fitting Bayesian Multivariate Bidirectional Mendelian Randomization Networks

Sarkar, Ni

Motivation: Mendelian randomization (MR) infers causal relationships between exposures and outcomes using genetic variants as instrumental variables. Typically, MR considers only a pair of exposure and outcome at a time, limiting its capability of capturing the entire causal network. We overcome this limitation by developing 'MR.RGM' (Mendelian randomization via reciprocal graphical model), a fast R-package that implements the Bayesian reciprocal graphical model and enables practitioners to construct holistic causal networks with possibly cyclic/reciprocal causation and proper uncertainty quantifications, offering a comprehensive understanding of complex biological systems and their interconnections. We developed 'MR.RGM', an open-source R package that applies bidirectional MR using a network-based strategy, enabling the exploration of causal relationships among multiple variables in complex biological systems. 'MR.RGM' holds the promise of unveiling intricate interactions and advancing our understanding of genetic networks, disease risks, and phenotypic complexities.

academic

MR.RGM: An R Package for Fitting Bayesian Multivariate Bidirectional Mendelian Randomization Networks

Basic Information

Paper ID: 2403.03944
Title: MR.RGM: An R Package for Fitting Bayesian Multivariate Bidirectional Mendelian Randomization Networks
Authors: Bitan Sarkar, Yang Ni (Texas A&M University)
Classification: stat.AP (Applied Statistics)
Published Journal: Bioinformatics
Paper Link: https://arxiv.org/abs/2403.03944
Code Repository: https://github.com/bitansa/MR.RGM

Abstract

Mendelian randomization (MR) infers causal relationships between exposures and outcomes by using genetic variants as instrumental variables. Traditional MR methods consider only a single pair of exposure and outcome variables at a time, limiting their ability to capture entire causal networks. This paper develops 'MR.RGM' (Mendelian Randomization via Reciprocal Graphical Models), a fast R package implementing Bayesian reciprocal graphical models that enables researchers to construct holistic causal networks with possible cyclic/reciprocal causal relationships and provide appropriate uncertainty quantification, thereby enabling comprehensive understanding of complex biological systems and their interconnections.

Research Background and Motivation

Problem Definition

Traditional Mendelian randomization (MR) methods primarily focus on causal inference for single exposure-outcome pairs, with the following limitations:

Neglect of Network Complexity: Inability to capture complex causal network structures among multiple variables
Missing Bidirectional Causality: Difficulty handling reciprocal or cyclic causal relationships between variables
Lack of Holistic Perspective: Inability to provide global causal understanding of biological systems

Research Significance

In complex biological systems, intricate interaction networks often exist among genes, proteins, and phenotypes. Understanding these networks is crucial for:

Disease risk assessment
Therapeutic target identification
Biological mechanism elucidation
Precision medicine development

Limitations of Existing Methods

Through comprehensive investigation of existing R packages (including mr.pivw, mr.raps, PPMR, OneSampleMR, MVMR, etc.), the authors found that all existing methods lack support for bidirectional MR analysis, which represents a critical deficiency in constructing complete causal networks.

Core Contributions

First R Package Supporting Bidirectional MR: MR.RGM is the only multivariate MR package capable of handling bidirectional causal relationships
Bayesian Network Framework: Implements uncertainty quantification and network structure inference based on reciprocal graphical models
Multiple Data Input Formats: Supports individual-level data and two types of summary-level data formats
Optimized Computational Efficiency: Uses C++ backend and Woodbury matrix identity to enhance computational efficiency
Network Motif Analysis: Provides NetworkMotif function for uncertainty quantification of specific network structures

Methodology Details

Mathematical Model

Base Model

For response variables $Y_i = (Y_{i1}, \ldots, Y_{ip})^T$ and instrumental variables $X_i = (X_{i1}, \ldots, X_{ik})^T$ , the model is defined as:

$Y_i = AY_i + BX_i + E_i, \quad E_i \sim N(0, \Sigma)$

where:

$A \in \mathbb{R}^{p \times p}$ : Causal effect matrix among response variables (diagonal elements are zero)
$B \in \mathbb{R}^{p \times k}$ : Effect matrix of instrumental variables on response variables
$\Sigma = \text{diag}(\sigma_1, \ldots, \sigma_p)$ : Error covariance matrix

Equivalent Form

The model can be rewritten as: $Y_i \sim N_p\{(I_p - A)^{-1}BX_i, (I_p - A)^{-1}\Sigma(I_p - A)^{-T}\}$

Prior Specification

Spike and Slab Prior

For elements of matrix $A$ : $a_{ij} \sim \gamma_{ij}N(0, \tau_{ij}) + (1-\gamma_{ij})N(0, \nu_1 \times \tau_{ij})$ $\gamma_{ij} \sim \text{Ber}(\rho_{ij}), \quad \rho_{ij} \sim \text{Beta}(a_\rho, b_\rho)$

Threshold Prior

$\tilde{a}_{ij} \sim N(0, \tau_{ij}), \quad a_{ij} = \tilde{a}_{ij}I(|\tilde{a}_{ij}| > t_A)$

MCMC Inference

Posterior inference is conducted using a hybrid strategy of Metropolis-Hastings algorithm and Gibbs sampling, including:

Marginal probability updates (Gibbs)
Effect coefficient updates (M-H)
Variance parameter updates (Gibbs)
Threshold parameter updates (M-H, Threshold prior only)

Computational Optimization

Woodbury Matrix Identity

To enhance computational efficiency, the Woodbury identity is employed for computing determinants and matrix inverses:

$\det(I_p - A^*) = (1 + (I_p - A)^{-1}_{(j,i)} \times (a_{ij} - a^*_{ij})) \det(I_p - A)$

$(I_p - A^*)^{-1} = (I_p - A)^{-1} - \frac{a_{ij} - a^*_{ij}}{1 + (a_{ij} - a^*_{ij})(I_p - A)^{-1}_{(j,i)}} (I_p - A)^{-1}_{(\cdot,i)} \times (I_p - A)^{-1}_{(j,\cdot)}$

Software Implementation

Core Functions

RGM Function

Input Formats:
- Individual-level data: X (instrumental variable matrix), Y (response variable matrix)
- Summary data type 1: Syy, Syx, Sxx covariance matrices
- Summary data type 2: Sxx, Beta, SigmaHat matrices
Required Parameters: D (binary indicator matrix), n (sample size)
Output: Causal effect estimates, network structure, posterior probabilities, etc.

NetworkMotif Function

Functionality: Uncertainty quantification for specific network motifs
Input: Target network structure Gamma, posterior samples GammaPst
Output: Posterior probability

Identifiability Conditions

To ensure model identifiability, each response variable must have at least one unique instrumental variable, i.e., each row of the D matrix must contain at least one unique 1.

Experimental Setup

Simulation Design

Model: $Y = AY + BX + E$
Sample Sizes: 10k, 30k, 50k
Network Scales: 5, 10 nodes
Sparsity: 25%, 50%
Effect Sizes: ±0.1
Variance Explained: 1%, 3%, 5%, 10%

Evaluation Metrics

TPR (True Positive Rate)
FPR (False Positive Rate)
FDR (False Discovery Rate)
MCC (Matthews Correlation Coefficient)
AUC (Area Under ROC Curve)

Comparison Methods

Primarily compared with the OneSampleMR package, which is the most recent advanced MR tool.

Experimental Results

Main Results

Network Structure Recovery Performance

MR.RGM significantly outperforms OneSampleMR under all tested conditions:

Network Scale 5, Sparsity 50%:

Spike & Slab Prior: AUC = 0.77-0.99, TPR = 0.50-0.99
OneSampleMR: AUC = 0.56-0.79, TPR = 0.08-0.84

Network Scale 10, Sparsity 25%:

Spike & Slab Prior: AUC = 0.87-0.995, TPR = 0.69-0.99
OneSampleMR: AUC = 0.48-0.52, TPR = 0.07-0.39

Computational Efficiency

Good Scalability: Sublinear growth with respect to number of nodes and instrumental variables
Practical Runtime: Analysis of 15 genes with 31 SNPs requires only 32.329 seconds on Apple M2 Pro

Robustness Analysis

Sensitivity tests to different error distributions show that MR.RGM is robust to normality assumptions:

Normal distribution: TPR=0.86, FPR=0.0133, MAD=0.0169
t-distribution (df=3): TPR=0.86, FPR=0.0200, MAD=0.0153
Laplace distribution: TPR=0.87, FPR=0.0333, MAD=0.0164

Real Data Application

Application on GTEx V7 dataset (332 samples, 15 genes) successfully constructed gene regulatory networks, demonstrating the practical utility of the method.

Classification of Existing MR Methods

Univariate Methods: mr.pivw, OneSampleMR
Multivariate Methods: MVMR, MRPC, MendelianRandomization
Bayesian Methods: mrbayes, MrDAG
Network Methods: MrDAG (DAG-only support)

Advantages of This Work

MR.RGM is the only tool supporting the following combination of features:

Multivariate analysis
Bidirectional causal relationships
Uncertainty quantification
Multiple data format support

Conclusions and Discussion

Main Conclusions

MR.RGM successfully fills the gap in bidirectional MR analysis
The Bayesian framework provides effective uncertainty quantification
The method performs excellently on both simulated and real data
Computational efficiency meets practical application requirements

Limitations

Normality Assumption: Although robustness tests show insensitivity, the method theoretically depends on normality
Identifiability Requirements: Each response variable requires a unique instrumental variable
Large-Scale Networks: Computational efficiency for very large networks requires further optimization

Future Directions

Extension to nonlinear causal relationships
Handling potential confounding factors
Integration of multi-omics data
Development of graphical user interface

In-Depth Evaluation

Strengths

Strong Innovation: First implementation of bidirectional MR analysis, filling an important gap
Rigorous Methodology: Solid theoretical foundation of Bayesian framework with correct MCMC implementation
High Practicality: Supports multiple data formats, meeting diverse application scenarios
Comprehensive Validation: Thorough simulation studies and real data verification
Software Quality: Open-source code, detailed documentation, user-friendly

Weaknesses

Limited Theoretical Analysis: Lacks theoretical guarantees for convergence and identifiability
Limited Comparative Experiments: Primarily compared with OneSampleMR, lacking comparison with other network methods
Single Application Case: Only demonstrates gene expression data application, lacking other biological applications

Impact

Academic Value: Provides important tools for causal inference research
Practical Value: Broad application prospects in genetics and epidemiological research
Reproducibility: Open-source code ensures reproducible results

Applicable Scenarios

Genetic Research: Gene regulatory network construction
Epidemiology: Disease risk factor network analysis
Systems Biology: Multi-omics data integration analysis
Precision Medicine: Individualized therapeutic target identification

References

Ni, Y., Ji, Y., & Müller, P. (2018). Reciprocal graphical models for integrative gene regulatory network analysis.
GTEx Consortium. (2020). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science, 369(6509), 1318-1330.
Palmer, T., Spiller, W., & Sanderson, E. (2023). OneSampleMR: One Sample Mendelian Randomization and Instrumental Variable Analyses.

Overall Assessment: This is a high-quality methodological paper that successfully addresses the important problem of multivariate bidirectional Mendelian randomization. The software implementation is comprehensive, validation is thorough, and it has significant value for causal inference and genetic research. While there is room for improvement in theoretical analysis and application scope, the overall contribution is substantial and worthy of recommendation.