2025-11-13T04:10:10.339085

MedFuse: Multiplicative Embedding Fusion For Irregular Clinical Time Series

Hsieh, Chien, Huang et al.
Clinical time series derived from electronic health records (EHRs) are inherently irregular, with asynchronous sampling, missing values, and heterogeneous feature dynamics. While numerical laboratory measurements are highly informative, existing embedding strategies usually combine feature identity and value embeddings through additive operations, which constrains their ability to capture value-dependent feature interactions. We propose MedFuse, a framework for irregular clinical time series centered on the MuFuse (Multiplicative Embedding Fusion) module. MuFuse fuses value and feature embeddings through multiplicative modulation, preserving feature-specific information while modeling higher-order dependencies across features. Experiments on three real-world datasets covering both intensive and chronic care show that MedFuse consistently outperforms state-of-the-art baselines on key predictive tasks. Analysis of the learned representations further demonstrates that multiplicative fusion enhances expressiveness and supports cross-dataset pretraining. These results establish MedFuse as a generalizable approach for modeling irregular clinical time series.
academic

MedFuse: Multiplicative Embedding Fusion For Irregular Clinical Time Series

Basic Information

  • Paper ID: 2511.09247
  • Title: MedFuse: Multiplicative Embedding Fusion For Irregular Clinical Time Series
  • Authors: Yi-Hsien Hsieh, Ta-Jung Chien, Chun-Kai Huang, Shao-Hua Sun, Che Lin (National Taiwan University)
  • Classification: cs.AI
  • Submission Date: November 12, 2025 (arXiv submission)
  • Paper Status: Under paper submission
  • Paper Link: https://arxiv.org/abs/2511.09247

Abstract

Clinical time series in electronic health records (EHR) exhibit inherent irregularity, including asynchronous sampling, missing values, and heterogeneous feature dynamics. Existing embedding strategies typically combine feature identity and numerical embeddings through additive operations, which limits the ability to capture value-dependent feature interactions. This paper proposes the MedFuse framework, centered on the MuFuse (Multiplicative Embedding Fusion) module. MuFuse fuses numerical and feature embeddings through multiplicative modulation, modeling higher-order dependencies while preserving feature-specific information. Experiments on three real-world datasets demonstrate that MedFuse consistently outperforms state-of-the-art baselines on critical prediction tasks. Analysis of learned representations further confirms that multiplicative fusion enhances expressiveness and supports cross-dataset pretraining.

Research Background and Motivation

1. Core Problems

Clinical time series modeling faces three major challenges:

  • Irregular Sampling: Vital signs may be monitored frequently, while laboratory tests are performed only when clinically necessary; patients may miss scheduled visits
  • High Missing Rate: Average missing rates in datasets reach 73.77%-88.14%
  • Difficult Numerical Representation: Laboratory values encode complex information within continuous ranges, theoretically requiring infinite representations

2. Problem Importance

  • Clinical time series are central to medical prediction and monitoring tasks
  • Effective modeling is critical for key medical tasks such as ICU mortality prediction and chronic disease risk assessment
  • Irregularity and missing values make traditional methods difficult to apply directly

3. Limitations of Existing Methods

Existing EVAT (Each Value As Token) methods primarily employ additive fusion:

  • Treat numerical embeddings as additive offsets to feature embeddings
  • Limited Expressiveness: Difficult to capture value-dependent nonlinear interactions
  • Loss of Clinical Semantics: Cannot distinguish qualitative differences between small and large deviations in laboratory measurements (e.g., mild creatinine elevation vs. sharp increase)

4. Research Motivation

  • Multiplicative fusion has proven to provide stronger semantic integration than additive or concatenation approaches in other domains
  • The special nature of clinical data (e.g., medical equifinality: different abnormal deviations may correspond to the same clinical risk) requires more flexible fusion mechanisms
  • Need for a universal framework that requires no imputation and can directly handle irregular observations

Core Contributions

  1. Multiplicative Value-Feature Fusion: Proposes the MuFuse module, performing nonlinear, feature-specific modulation through value-conditioned multiplicative fusion without expanding the embedding vocabulary
  2. Universal Imputation-Free Framework: Constructs MedFuse based on MuFuse, adopting a (feature, value, timestamp) triplet tokenization scheme to directly model irregular measurements
  3. Comprehensive Validation and Transferability:
    • Consistently outperforms strong baselines on ICU and chronic disease datasets
    • Ablation studies confirm multiplicative superiority over additive fusion
    • Transfer experiments show learned feature embeddings can be reused across datasets
  4. Theoretical Insights: Proves that the recent SOTA method SCANE is actually a special case of MuFuse (d'=1), establishing a more general fusion mechanism

Method Details

Task Definition

Given observation set O = {(f, v, t)}:

  • Input: f ∈ {1,...,F} feature identity (e.g., laboratory test type), v ∈ ℝ recorded value, t ∈ ℝ⁺ timestamp
  • Output: Prediction task labels (e.g., ICU mortality, HCC incidence risk)
  • Constraint: Process only actually observed records (Mf,t = 1), no imputation of missing values required

Model Architecture

Overall Architecture (MedFuse)

Observation Triplet (f,v,t)
    ↓
MuFuse Embedding Module
    ├─ Feature Identity Embedding: ef ∈ ℝᵈ
    ├─ Numerical Embedding: ev ∈ ℝᵈ'
    └─ Multiplicative Fusion: ef,v = ef ⊙ ev
    ↓
Temporal Encoding Addition: ef,v,t = ef,v + pt
    ↓
Transformer Encoder (N layers)
    ↓
Linear Classification Head + Softmax

Core Module: MuFuse

1. Feature Identity Embedding

ef ∈ ℝᵈ  (standard lookup table)

2. Numerical Embedding

zv = φ(v) ∈ ℝᵈ'           # shared nonlinear projector
ev|f = γf ⊙ zv + βf       # feature-specific affine transformation

where γf, βf ∈ ℝᵈ' are learnable feature-specific parameters

3. Multiplicative Fusion

When d' = d:

MuFuse(ef, ev) = ef ⊙ ev = ef,v

When d ≠ d' (assuming d = d' × k):

  • Partition ef into k consecutive blocks: ef = e⁽¹⁾f; e⁽²⁾f; ...; e⁽ᵏ⁾f
  • Each entry of ev passes through sigmoid as gating: g(vj) = σ(vj) ∈ (0,1)
  • Scalar gating applied to corresponding blocks: e⁽ⁱ⁾f,v = g(vj) · e⁽ⁱ⁾f

4. Categorical Feature Processing

ef,c = Wcat · Concat(ef, ec) ∈ ℝᵈ

5. Temporal Embedding (Sinusoidal Positional Encoding)

pt[2i] = sin(t/ωi)
pt[2i+1] = cos(t/ωi)
ef,v,t = ef,v + pt

Technical Innovations

1. Advantages of Multiplicative Fusion

Mathematical Expression:

MuFuse: ef,v = ef ⊙ ev = ef ⊙ (1 + e'v) = ef + ef ⊙ e'v
Additive Fusion: ef,v = ef + ev
  • MuFuse introduces interaction term ef ⊙ e'v, making numerical modulation dependent on feature identity
  • In additive fusion, ev acts as an independent term, unaffected by ef

2. Modeling Medical Equifinality (Masking & Collapse)

Clinical scenario: Both hyponatremia and hypernatremia can cause seizures

  • Additive Fusion: Requires assigning the same embedding to different value ranges, losing flexibility
  • MuFuse: Through element-wise multiplication, even with different ev, can collapse different embeddings to the same representation via ef as a mask

3. Relationship with SCANE

SCANE directly multiplies observed values as scalars with feature embeddings, which is actually a special case of MuFuse (d'=1, no value transformation). MuFuse provides stronger expressiveness through flexible dimension selection and nonlinear projection.

4. Why Additive Encoding for Time?

Experiments show additive time encoding outperforms multiplicative (AUPRC: 0.6717 vs 0.6495):

  • Additive: Preserves AC signal amplitude and spectral patterns of sinusoidal encoding, with feature embeddings only as DC offset
  • Multiplicative: Alters AC amplitude and spectral composition, disrupting the regular representation of ordered positional encoding

Experimental Setup

Datasets

DatasetTypeSamplesPositive RateMissing RateObservation WindowNumerical FeaturesCategorical Features
P12ICU Mortality11,98814.2%73.77%48h/2h window402
MI3ICU Mortality52,87114.0%88.14%48h/2h window1284
HCCHepatocellular Carcinoma34,2964.6%74.64%1y/90d window308

Preprocessing Protocol:

  • ICU tasks: 48-hour observation window, 2-hour aggregation (24 timestamps)
  • HCC task: 1-year observation window, 90-day aggregation (4 timestamps)
  • Numerical variables: median; categorical variables: mode
  • No imputation; tokens generated only from observed values

Evaluation Metrics

  • Primary Metric: AUPRC (Area Under Precision-Recall Curve) - more suitable for class imbalance
  • Auxiliary Metrics: AUROC, Accuracy (ICU) / c-index (HCC)
  • Statistical Significance: 95% confidence intervals, estimated via 1000 bootstrap samples

Comparison Methods

  1. Traditional Ensembles: Random Forest, XGBoost
  2. General Sequence Models: Transformer encoder, TCN
  3. Clinical Time Series Specialized:
    • SAnD: Masked self-attention
    • mTAN: Continuous-time attention
    • STraTS: Self-supervised triplet learning
    • SUMMIT (SCANE): Current SOTA, numerical scaling mechanism

Implementation Details

  • Optimizer: Adam
  • Learning Rate: 3e-5 (MedFuse), 5e-4 (most baselines)
  • Hyperparameter Tuning: Optuna (validation set)
  • Early Stopping: 30-380 epochs (dataset dependent)
  • Model Dimensions: d=144, d' varies (ablation studies)
  • Transformer Layers: 32 layers (MedFuse)

Experimental Results

Main Results

Table 1: Performance Comparison (Best in bold, second-best underlined)

MethodMI3 AUPRCP12 AUPRCHCC AUPRC
Random Forest0.4367±0.05170.4805±0.05330.3934±0.0583
XGBoost0.4553±0.05270.4980±0.05440.3887±0.0592
Transformer0.5074±0.05100.5435±0.05600.4139±0.0571
SAnD0.5463±0.04620.4615±0.05980.3769±0.0337
mTAN0.5536±0.03590.4991±0.05210.4545±0.0264
STraTS0.5886±0.05460.5206±0.05340.4270±0.0186
SUMMIT0.6328±0.02770.5504±0.05630.4553±0.0577
MedFuse0.6574±0.02700.5612±0.05580.4595±0.0556

Key Findings:

  • MedFuse achieves best primary metric AUPRC on all three datasets
  • Improvements over SUMMIT: MI3 +3.9%, P12 +2.0%, HCC +0.9%
  • AUROC and accuracy also achieve best on MI3 (0.9078 and 0.9153)

Ablation Studies

Table 2: Feature-Value Fusion Strategy Ablation (P12)

MethodAUPRCAUROCAccuracy
MuFuse (Multiplicative)0.5612±0.05580.8686±0.01900.8837±0.0558
Additive0.5317±0.05460.8549±0.02050.8754±0.0131
Concatenation0.5291±0.05640.8518±0.02040.8779±0.0129

Conclusion: Multiplicative fusion shows 5.5% improvement in AUPRC over additive, confirming the effectiveness of value-conditioned multiplicative modulation

Impact of Dimension Splitting Factor k

Experimental Setup: Fix d=144, vary k (i.e., d'=d/k)

P12 Results:

  • k=1 (d'=144): AUPRC 0.539
  • k=9 (d'=16): AUPRC 0.561 (optimal)
  • k=144 (d'=1, equivalent to SCANE): AUPRC 0.548

Insights:

  • Moderate dimension splitting provides optimal balance
  • Too coarse (small k): Insufficient parameterization of value effects
  • Too fine (large k): Overfitting of feature-value interactions
  • Validates the flexible alignment design of broadcast Hadamard product

Cross-Dataset Transfer Learning

Experimental Protocol:

  1. Pretrain on source dataset
  2. Transfer only feature identity embeddings of overlapping features (F∩)
  3. P12 and MI3 share 25 features (59.5% of P12, 18.9% of MI3)

Table 3: Cross-Dataset Transfer Results

Transfer DirectionAUPRCImprovement
MI3→P12 (Large→Small)0.5454+1.7%
P12 Random Training0.5361baseline
MI3 Subsample→P120.5276-1.6%
P12→MI3 (Small→Large)0.6422-3.3%
MI3 Random Training0.6639baseline

Key Findings:

  • Source dataset scale is critical: Large→small dataset shows positive transfer
  • Dataset identity is not the main factor: MI3 subsample→P12 still shows negative transfer
  • Feature embeddings capture reusable, cohort-agnostic semantics

Embedding Visualization

t-SNE Visualization (HCC Dataset):

  • Before Fusion: Clear clustering of tokens of the same feature type
  • After First Transformer Layer: Clustering characteristics preserved, confirming MuFuse robustness

1. Sequence Model Foundations

  • Classical RNNs: LSTM, GRU - establish baselines
  • Transformer: Capture long-range dependencies
  • Efficient Variants: Informer (sparse self-attention)

2. Medical Time Series Modeling

  • Imputation Methods: BRITS (joint learning of imputation and prediction)
  • Grid Resampling: SAnD (masked self-attention, requires regular grid)
  • Continuous-Time Attention: mTAN (directly handles irregular observations)

3. EVAT Paradigm

  • STraTS: Self-supervised triplet learning
  • SCANE/SUMMIT: Numerical scaling mechanism (SOTA)
  • This Work: Proves SCANE is a special case of MuFuse, provides more general framework

4. Fusion Operation Research

  • Chrysos et al. (2025): Advantages of Hadamard product in deep learning
  • This Work: First systematic application of multiplicative fusion to clinical EHR numerical modeling

Conclusions and Discussion

Main Conclusions

  1. Multiplicative Fusion Outperforms Additive: MuFuse achieves feature-specific nonlinear interactions through value-conditioned modulation
  2. Universal Imputation-Free Framework: MedFuse is effective in both ICU and chronic disease scenarios
  3. Transferability: Learned feature embeddings support cross-dataset adaptation (requires sufficient source data scale)
  4. Theoretical Unification: MuFuse generalizes SCANE, providing clearer design principles

Limitations

  1. Computational Cost: 32-layer Transformer may limit real-time applications
  2. Transfer Conditions: Cross-dataset transfer requires large-scale source datasets
  3. Feature Overlap: Transfer depends on sufficient feature overlap (18.9%-59.5% in this work)
  4. Interpretability: Clinical semantics of multiplicative interactions require further exploration
  5. Multimodal Extension: Currently handles only numerical and categorical features, not text or images

Future Directions

  1. Large-Scale Multimodal Pretraining: Extend to clinical notes and medical imaging
  2. Causal Inference: Integrate counterfactual analysis to enhance interpretability
  3. Trustworthy Clinical Decision Support: Deploy to real clinical environments
  4. Efficient Architectures: Explore lightweight variants for resource-constrained scenarios
  5. Improved Temporal Encoding: Research positional encodings better suited for irregular sampling

In-Depth Evaluation

Strengths

1. Method Innovation (★★★★★)

  • Solid Core Innovation: Multiplicative fusion has clear theoretical motivation (medical equifinality, interaction terms)
  • Generalizes SOTA: Elegantly proves SCANE is a special case (d'=1), providing unified framework
  • Flexible Design: Broadcast Hadamard product supports arbitrary dimension ratios

2. Experimental Sufficiency (★★★★★)

  • Diverse Datasets: Covers ICU (acute) and HCC (chronic) scenarios
  • Comprehensive Ablations: Fusion strategy, dimension factor, transfer learning across three dimensions
  • Statistical Rigor: Bootstrap confidence intervals, multi-metric evaluation
  • Visualization Analysis: t-SNE validates embedding quality

3. Writing Clarity (★★★★☆)

  • Clear structure, well-motivated exposition
  • Precise mathematical expressions (Equations 4-11)
  • Detailed appendix (hyperparameters, dataset statistics, additional experiments)
  • Minor limitation: Some clinical terminology could benefit from more explanation

4. Practical Value (★★★★☆)

  • No imputation needed, reduces preprocessing complexity
  • Code not yet public (under submission), but method description is detailed
  • Relatively high computational cost (32-layer Transformer)

Weaknesses

1. Method Limitations

  • Temporal Encoding Contradiction: Acknowledges multiplicative fusion unsuitable for time encoding, but lacks deep theoretical explanation
  • Dimension Selection: Optimal k value depends on dataset, lacks automatic selection mechanism
  • Categorical Feature Handling: Simple concatenation + linear transformation, insufficient exploration of multiplicative fusion potential

2. Experimental Deficiencies

  • Limited Transfer Experiments: Only tested between two ICU datasets, HCC not involved
  • Low Feature Overlap: Only 18.9% feature overlap on MI3 side, limits transfer potential assessment
  • Missing Computational Cost Analysis: No reported training time or memory consumption
  • Hyperparameter Sensitivity: Requires significant layer adjustment across datasets (1-32 layers)

3. Insufficient Analysis

  • Feature Interaction Visualization: Lacks specific clinical feature interaction analysis
  • Failure Case Analysis: No discussion of model prediction errors
  • Incomplete SCANE Comparison: While proving special case relationship, lacks direct performance comparison at different d' settings

4. Reproducibility Issues

  • Code Not Public: Affects result verification
  • Private Datasets: HCC dataset cannot be publicly accessed
  • Random Seeds: Not explicitly stated whether fixed

Impact Assessment

Contribution to Field (★★★★☆)

  • Theoretical Contribution: Establishes theoretical foundation for multiplicative fusion in EHR modeling
  • Method Contribution: Provides universal framework, extensible to other irregular time series
  • Empirical Contribution: Establishes new SOTA on standard benchmarks

Practical Value (★★★☆☆)

  • Advantages: No imputation needed, directly handles irregular data
  • Limitations: High computational cost, requires large-scale source datasets for transfer
  • Applicable Scenarios: Suitable for research institutions and large medical centers with sufficient computational resources

Reproducibility (★★★☆☆)

  • Detailed Method Description: Clear formulas and architecture
  • Missing Code: Reduces reproducibility
  • Partial Data Availability: P12 and MI3 public, HCC private

Applicable Scenarios

Best Suited For

  1. High Missing Rate Scenarios (>70%): Advantages of no imputation clearly evident
  2. Irregular Sampling: ICU monitoring, outpatient follow-ups with asynchronous data
  3. Numerical Feature Dominant: Laboratory tests, vital signs and other continuous measurements
  4. Pretraining Needs: Can leverage large-scale source datasets

Less Suitable For

  1. Real-Time Prediction: 32-layer Transformer inference latency relatively high
  2. Small Sample Scenarios: Transfer learning requires large-scale source data
  3. Pure Categorical Features: Multiplicative fusion advantages not evident
  4. Resource-Constrained Environments: Edge devices, mobile health applications

Improvement Suggestions

  1. Adaptive Dimension Selection: Develop methods to automatically determine k (e.g., neural architecture search)
  2. Lightweight Variants: Explore knowledge distillation or pruning to reduce computational cost
  3. Multimodal Extension: Integrate clinical notes and medical imaging
  4. Interpretability Enhancement: Provide clinical semantic explanations for feature interactions
  5. Public Code and Models: Promote community verification and application

Selected References

  1. Huang et al. (2024): SCANE/SUMMIT - improved SOTA baseline in this work
  2. Chrysos et al. (2025): Survey on Hadamard product in deep learning
  3. Tipirneni & Reddy (2022): STraTS - representative work of EVAT paradigm
  4. Shukla & Marlin (2021): mTAN - continuous-time attention mechanism
  5. Vaswani et al. (2017): Transformer - backbone architecture in this work
  6. Johnson et al. (2016): MIMIC-III database - key evaluation dataset

Summary

MedFuse is a paper with substantial contributions to clinical time series modeling. Its core innovation—multiplicative embedding fusion (MuFuse)—not only elegantly generalizes existing SOTA methods in theory but also achieves consistent performance improvements on multiple real-world datasets. The paper's experimental design is comprehensive, systematically validating method effectiveness from main performance comparisons through ablation studies, dimension analysis, and transfer learning.

Particularly commendable is the paper's insight into medical equifinality—naturally modeling the phenomenon where different abnormal deviations correspond to the same clinical risk through the masking effect of multiplicative fusion. This demonstrates the authors' deep understanding of the clinical domain.

However, the paper has some limitations: relatively high computational cost, limited transfer learning experiments, and lack of code release. Nevertheless, MedFuse provides a powerful and universal framework for irregular clinical time series modeling, with significant implications for advancing medical AI. Future work on multimodal extension, interpretability, and practical clinical deployment is anticipated.

Recommendation Score: 8.5/10