Motivation: Mendelian randomization (MR) infers causal relationships between exposures and outcomes using genetic variants as instrumental variables. Typically, MR considers only a pair of exposure and outcome at a time, limiting its capability of capturing the entire causal network. We overcome this limitation by developing 'MR.RGM' (Mendelian randomization via reciprocal graphical model), a fast R-package that implements the Bayesian reciprocal graphical model and enables practitioners to construct holistic causal networks with possibly cyclic/reciprocal causation and proper uncertainty quantifications, offering a comprehensive understanding of complex biological systems and their interconnections. We developed 'MR.RGM', an open-source R package that applies bidirectional MR using a network-based strategy, enabling the exploration of causal relationships among multiple variables in complex biological systems. 'MR.RGM' holds the promise of unveiling intricate interactions and advancing our understanding of genetic networks, disease risks, and phenotypic complexities.
MR.RGM: An R Package for Fitting Bayesian Multivariate Bidirectional Mendelian Randomization Networks
- Paper ID: 2403.03944
- Title: MR.RGM: An R Package for Fitting Bayesian Multivariate Bidirectional Mendelian Randomization Networks
- Authors: Bitan Sarkar, Yang Ni (Texas A&M University)
- Classification: stat.AP (Applied Statistics)
- Published Journal: Bioinformatics
- Paper Link: https://arxiv.org/abs/2403.03944
- Code Repository: https://github.com/bitansa/MR.RGM
Mendelian randomization (MR) infers causal relationships between exposures and outcomes by using genetic variants as instrumental variables. Traditional MR methods consider only a single pair of exposure and outcome variables at a time, limiting their ability to capture entire causal networks. This paper develops 'MR.RGM' (Mendelian Randomization via Reciprocal Graphical Models), a fast R package implementing Bayesian reciprocal graphical models that enables researchers to construct holistic causal networks with possible cyclic/reciprocal causal relationships and provide appropriate uncertainty quantification, thereby enabling comprehensive understanding of complex biological systems and their interconnections.
Traditional Mendelian randomization (MR) methods primarily focus on causal inference for single exposure-outcome pairs, with the following limitations:
- Neglect of Network Complexity: Inability to capture complex causal network structures among multiple variables
- Missing Bidirectional Causality: Difficulty handling reciprocal or cyclic causal relationships between variables
- Lack of Holistic Perspective: Inability to provide global causal understanding of biological systems
In complex biological systems, intricate interaction networks often exist among genes, proteins, and phenotypes. Understanding these networks is crucial for:
- Disease risk assessment
- Therapeutic target identification
- Biological mechanism elucidation
- Precision medicine development
Through comprehensive investigation of existing R packages (including mr.pivw, mr.raps, PPMR, OneSampleMR, MVMR, etc.), the authors found that all existing methods lack support for bidirectional MR analysis, which represents a critical deficiency in constructing complete causal networks.
- First R Package Supporting Bidirectional MR: MR.RGM is the only multivariate MR package capable of handling bidirectional causal relationships
- Bayesian Network Framework: Implements uncertainty quantification and network structure inference based on reciprocal graphical models
- Multiple Data Input Formats: Supports individual-level data and two types of summary-level data formats
- Optimized Computational Efficiency: Uses C++ backend and Woodbury matrix identity to enhance computational efficiency
- Network Motif Analysis: Provides NetworkMotif function for uncertainty quantification of specific network structures
For response variables Yi=(Yi1,…,Yip)T and instrumental variables Xi=(Xi1,…,Xik)T, the model is defined as:
Yi=AYi+BXi+Ei,Ei∼N(0,Σ)
where:
- A∈Rp×p: Causal effect matrix among response variables (diagonal elements are zero)
- B∈Rp×k: Effect matrix of instrumental variables on response variables
- Σ=diag(σ1,…,σp): Error covariance matrix
The model can be rewritten as:
Yi∼Np{(Ip−A)−1BXi,(Ip−A)−1Σ(Ip−A)−T}
For elements of matrix A:
aij∼γijN(0,τij)+(1−γij)N(0,ν1×τij)γij∼Ber(ρij),ρij∼Beta(aρ,bρ)
a~ij∼N(0,τij),aij=a~ijI(∣a~ij∣>tA)
Posterior inference is conducted using a hybrid strategy of Metropolis-Hastings algorithm and Gibbs sampling, including:
- Marginal probability updates (Gibbs)
- Effect coefficient updates (M-H)
- Variance parameter updates (Gibbs)
- Threshold parameter updates (M-H, Threshold prior only)
To enhance computational efficiency, the Woodbury identity is employed for computing determinants and matrix inverses:
det(Ip−A∗)=(1+(Ip−A)(j,i)−1×(aij−aij∗))det(Ip−A)
(Ip−A∗)−1=(Ip−A)−1−1+(aij−aij∗)(Ip−A)(j,i)−1aij−aij∗(Ip−A)(⋅,i)−1×(Ip−A)(j,⋅)−1
- Input Formats:
- Individual-level data: X (instrumental variable matrix), Y (response variable matrix)
- Summary data type 1: Syy, Syx, Sxx covariance matrices
- Summary data type 2: Sxx, Beta, SigmaHat matrices
- Required Parameters: D (binary indicator matrix), n (sample size)
- Output: Causal effect estimates, network structure, posterior probabilities, etc.
- Functionality: Uncertainty quantification for specific network motifs
- Input: Target network structure Gamma, posterior samples GammaPst
- Output: Posterior probability
To ensure model identifiability, each response variable must have at least one unique instrumental variable, i.e., each row of the D matrix must contain at least one unique 1.
- Model: Y=AY+BX+E
- Sample Sizes: 10k, 30k, 50k
- Network Scales: 5, 10 nodes
- Sparsity: 25%, 50%
- Effect Sizes: ±0.1
- Variance Explained: 1%, 3%, 5%, 10%
- TPR (True Positive Rate)
- FPR (False Positive Rate)
- FDR (False Discovery Rate)
- MCC (Matthews Correlation Coefficient)
- AUC (Area Under ROC Curve)
Primarily compared with the OneSampleMR package, which is the most recent advanced MR tool.
MR.RGM significantly outperforms OneSampleMR under all tested conditions:
Network Scale 5, Sparsity 50%:
- Spike & Slab Prior: AUC = 0.77-0.99, TPR = 0.50-0.99
- OneSampleMR: AUC = 0.56-0.79, TPR = 0.08-0.84
Network Scale 10, Sparsity 25%:
- Spike & Slab Prior: AUC = 0.87-0.995, TPR = 0.69-0.99
- OneSampleMR: AUC = 0.48-0.52, TPR = 0.07-0.39
- Good Scalability: Sublinear growth with respect to number of nodes and instrumental variables
- Practical Runtime: Analysis of 15 genes with 31 SNPs requires only 32.329 seconds on Apple M2 Pro
Sensitivity tests to different error distributions show that MR.RGM is robust to normality assumptions:
- Normal distribution: TPR=0.86, FPR=0.0133, MAD=0.0169
- t-distribution (df=3): TPR=0.86, FPR=0.0200, MAD=0.0153
- Laplace distribution: TPR=0.87, FPR=0.0333, MAD=0.0164
Application on GTEx V7 dataset (332 samples, 15 genes) successfully constructed gene regulatory networks, demonstrating the practical utility of the method.
- Univariate Methods: mr.pivw, OneSampleMR
- Multivariate Methods: MVMR, MRPC, MendelianRandomization
- Bayesian Methods: mrbayes, MrDAG
- Network Methods: MrDAG (DAG-only support)
MR.RGM is the only tool supporting the following combination of features:
- Multivariate analysis
- Bidirectional causal relationships
- Uncertainty quantification
- Multiple data format support
- MR.RGM successfully fills the gap in bidirectional MR analysis
- The Bayesian framework provides effective uncertainty quantification
- The method performs excellently on both simulated and real data
- Computational efficiency meets practical application requirements
- Normality Assumption: Although robustness tests show insensitivity, the method theoretically depends on normality
- Identifiability Requirements: Each response variable requires a unique instrumental variable
- Large-Scale Networks: Computational efficiency for very large networks requires further optimization
- Extension to nonlinear causal relationships
- Handling potential confounding factors
- Integration of multi-omics data
- Development of graphical user interface
- Strong Innovation: First implementation of bidirectional MR analysis, filling an important gap
- Rigorous Methodology: Solid theoretical foundation of Bayesian framework with correct MCMC implementation
- High Practicality: Supports multiple data formats, meeting diverse application scenarios
- Comprehensive Validation: Thorough simulation studies and real data verification
- Software Quality: Open-source code, detailed documentation, user-friendly
- Limited Theoretical Analysis: Lacks theoretical guarantees for convergence and identifiability
- Limited Comparative Experiments: Primarily compared with OneSampleMR, lacking comparison with other network methods
- Single Application Case: Only demonstrates gene expression data application, lacking other biological applications
- Academic Value: Provides important tools for causal inference research
- Practical Value: Broad application prospects in genetics and epidemiological research
- Reproducibility: Open-source code ensures reproducible results
- Genetic Research: Gene regulatory network construction
- Epidemiology: Disease risk factor network analysis
- Systems Biology: Multi-omics data integration analysis
- Precision Medicine: Individualized therapeutic target identification
- Ni, Y., Ji, Y., & Müller, P. (2018). Reciprocal graphical models for integrative gene regulatory network analysis.
- GTEx Consortium. (2020). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science, 369(6509), 1318-1330.
- Palmer, T., Spiller, W., & Sanderson, E. (2023). OneSampleMR: One Sample Mendelian Randomization and Instrumental Variable Analyses.
Overall Assessment: This is a high-quality methodological paper that successfully addresses the important problem of multivariate bidirectional Mendelian randomization. The software implementation is comprehensive, validation is thorough, and it has significant value for causal inference and genetic research. While there is room for improvement in theoretical analysis and application scope, the overall contribution is substantial and worthy of recommendation.