2025-11-13T04:10:10.339085

MedFuse: Multiplicative Embedding Fusion For Irregular Clinical Time Series

Hsieh, Chien, Huang et al.

Clinical time series derived from electronic health records (EHRs) are inherently irregular, with asynchronous sampling, missing values, and heterogeneous feature dynamics. While numerical laboratory measurements are highly informative, existing embedding strategies usually combine feature identity and value embeddings through additive operations, which constrains their ability to capture value-dependent feature interactions. We propose MedFuse, a framework for irregular clinical time series centered on the MuFuse (Multiplicative Embedding Fusion) module. MuFuse fuses value and feature embeddings through multiplicative modulation, preserving feature-specific information while modeling higher-order dependencies across features. Experiments on three real-world datasets covering both intensive and chronic care show that MedFuse consistently outperforms state-of-the-art baselines on key predictive tasks. Analysis of the learned representations further demonstrates that multiplicative fusion enhances expressiveness and supports cross-dataset pretraining. These results establish MedFuse as a generalizable approach for modeling irregular clinical time series.

academic

MedFuse: Multiplicative Embedding Fusion For Irregular Clinical Time Series

Basic Information

Paper ID: 2511.09247
Title: MedFuse: Multiplicative Embedding Fusion For Irregular Clinical Time Series
Authors: Yi-Hsien Hsieh, Ta-Jung Chien, Chun-Kai Huang, Shao-Hua Sun, Che Lin (National Taiwan University)
Classification: cs.AI
Submission Date: November 12, 2025 (arXiv submission)
Paper Status: Under paper submission
Paper Link: https://arxiv.org/abs/2511.09247

Abstract

Clinical time series in electronic health records (EHR) exhibit inherent irregularity, including asynchronous sampling, missing values, and heterogeneous feature dynamics. Existing embedding strategies typically combine feature identity and numerical embeddings through additive operations, which limits the ability to capture value-dependent feature interactions. This paper proposes the MedFuse framework, centered on the MuFuse (Multiplicative Embedding Fusion) module. MuFuse fuses numerical and feature embeddings through multiplicative modulation, modeling higher-order dependencies while preserving feature-specific information. Experiments on three real-world datasets demonstrate that MedFuse consistently outperforms state-of-the-art baselines on critical prediction tasks. Analysis of learned representations further confirms that multiplicative fusion enhances expressiveness and supports cross-dataset pretraining.

Research Background and Motivation

1. Core Problems

Clinical time series modeling faces three major challenges:

Irregular Sampling: Vital signs may be monitored frequently, while laboratory tests are performed only when clinically necessary; patients may miss scheduled visits
High Missing Rate: Average missing rates in datasets reach 73.77%-88.14%
Difficult Numerical Representation: Laboratory values encode complex information within continuous ranges, theoretically requiring infinite representations

2. Problem Importance

Clinical time series are central to medical prediction and monitoring tasks
Effective modeling is critical for key medical tasks such as ICU mortality prediction and chronic disease risk assessment
Irregularity and missing values make traditional methods difficult to apply directly

3. Limitations of Existing Methods

Existing EVAT (Each Value As Token) methods primarily employ additive fusion:

Treat numerical embeddings as additive offsets to feature embeddings
Limited Expressiveness: Difficult to capture value-dependent nonlinear interactions
Loss of Clinical Semantics: Cannot distinguish qualitative differences between small and large deviations in laboratory measurements (e.g., mild creatinine elevation vs. sharp increase)

4. Research Motivation

Multiplicative fusion has proven to provide stronger semantic integration than additive or concatenation approaches in other domains
The special nature of clinical data (e.g., medical equifinality: different abnormal deviations may correspond to the same clinical risk) requires more flexible fusion mechanisms
Need for a universal framework that requires no imputation and can directly handle irregular observations

Core Contributions

Multiplicative Value-Feature Fusion: Proposes the MuFuse module, performing nonlinear, feature-specific modulation through value-conditioned multiplicative fusion without expanding the embedding vocabulary
Universal Imputation-Free Framework: Constructs MedFuse based on MuFuse, adopting a (feature, value, timestamp) triplet tokenization scheme to directly model irregular measurements
Comprehensive Validation and Transferability:
- Consistently outperforms strong baselines on ICU and chronic disease datasets
- Ablation studies confirm multiplicative superiority over additive fusion
- Transfer experiments show learned feature embeddings can be reused across datasets
Theoretical Insights: Proves that the recent SOTA method SCANE is actually a special case of MuFuse (d'=1), establishing a more general fusion mechanism

Method Details

Task Definition

Given observation set O = {(f, v, t)}:

Input: f ∈ {1,...,F} feature identity (e.g., laboratory test type), v ∈ ℝ recorded value, t ∈ ℝ⁺ timestamp
Output: Prediction task labels (e.g., ICU mortality, HCC incidence risk)
Constraint: Process only actually observed records (Mf,t = 1), no imputation of missing values required

Model Architecture

Overall Architecture (MedFuse)

Observation Triplet (f,v,t)
    ↓
MuFuse Embedding Module
    ├─ Feature Identity Embedding: ef ∈ ℝᵈ
    ├─ Numerical Embedding: ev ∈ ℝᵈ'
    └─ Multiplicative Fusion: ef,v = ef ⊙ ev
    ↓
Temporal Encoding Addition: ef,v,t = ef,v + pt
    ↓
Transformer Encoder (N layers)
    ↓
Linear Classification Head + Softmax

Core Module: MuFuse

1. Feature Identity Embedding

ef ∈ ℝᵈ  (standard lookup table)

2. Numerical Embedding

zv = φ(v) ∈ ℝᵈ'           # shared nonlinear projector
ev|f = γf ⊙ zv + βf       # feature-specific affine transformation

where γf, βf ∈ ℝᵈ' are learnable feature-specific parameters

3. Multiplicative Fusion

When d' = d:

MuFuse(ef, ev) = ef ⊙ ev = ef,v

When d ≠ d' (assuming d = d' × k):

Partition ef into k consecutive blocks: ef = e⁽¹⁾f; e⁽²⁾f; ...; e⁽ᵏ⁾f
Each entry of ev passes through sigmoid as gating: g(vj) = σ(vj) ∈ (0,1)
Scalar gating applied to corresponding blocks: e⁽ⁱ⁾f,v = g(vj) · e⁽ⁱ⁾f

4. Categorical Feature Processing

ef,c = Wcat · Concat(ef, ec) ∈ ℝᵈ

5. Temporal Embedding (Sinusoidal Positional Encoding)

pt[2i] = sin(t/ωi)
pt[2i+1] = cos(t/ωi)
ef,v,t = ef,v + pt

Technical Innovations

1. Advantages of Multiplicative Fusion

Mathematical Expression:

MuFuse: ef,v = ef ⊙ ev = ef ⊙ (1 + e'v) = ef + ef ⊙ e'v
Additive Fusion: ef,v = ef + ev

MuFuse introduces interaction term ef ⊙ e'v, making numerical modulation dependent on feature identity
In additive fusion, ev acts as an independent term, unaffected by ef

2. Modeling Medical Equifinality (Masking & Collapse)

Clinical scenario: Both hyponatremia and hypernatremia can cause seizures

Additive Fusion: Requires assigning the same embedding to different value ranges, losing flexibility
MuFuse: Through element-wise multiplication, even with different ev, can collapse different embeddings to the same representation via ef as a mask

3. Relationship with SCANE

SCANE directly multiplies observed values as scalars with feature embeddings, which is actually a special case of MuFuse (d'=1, no value transformation). MuFuse provides stronger expressiveness through flexible dimension selection and nonlinear projection.

4. Why Additive Encoding for Time?

Experiments show additive time encoding outperforms multiplicative (AUPRC: 0.6717 vs 0.6495):

Additive: Preserves AC signal amplitude and spectral patterns of sinusoidal encoding, with feature embeddings only as DC offset
Multiplicative: Alters AC amplitude and spectral composition, disrupting the regular representation of ordered positional encoding

Experimental Setup

Datasets

Dataset	Type	Samples	Positive Rate	Missing Rate	Observation Window	Numerical Features	Categorical Features
P12	ICU Mortality	11,988	14.2%	73.77%	48h/2h window	40	2
MI3	ICU Mortality	52,871	14.0%	88.14%	48h/2h window	128	4
HCC	Hepatocellular Carcinoma	34,296	4.6%	74.64%	1y/90d window	30	8

Preprocessing Protocol:

ICU tasks: 48-hour observation window, 2-hour aggregation (24 timestamps)
HCC task: 1-year observation window, 90-day aggregation (4 timestamps)
Numerical variables: median; categorical variables: mode
No imputation; tokens generated only from observed values

Evaluation Metrics

Primary Metric: AUPRC (Area Under Precision-Recall Curve) - more suitable for class imbalance
Auxiliary Metrics: AUROC, Accuracy (ICU) / c-index (HCC)
Statistical Significance: 95% confidence intervals, estimated via 1000 bootstrap samples

Comparison Methods

Traditional Ensembles: Random Forest, XGBoost
General Sequence Models: Transformer encoder, TCN
Clinical Time Series Specialized:
- SAnD: Masked self-attention
- mTAN: Continuous-time attention
- STraTS: Self-supervised triplet learning
- SUMMIT (SCANE): Current SOTA, numerical scaling mechanism

Implementation Details

Optimizer: Adam
Learning Rate: 3e-5 (MedFuse), 5e-4 (most baselines)
Hyperparameter Tuning: Optuna (validation set)
Early Stopping: 30-380 epochs (dataset dependent)
Model Dimensions: d=144, d' varies (ablation studies)
Transformer Layers: 32 layers (MedFuse)

Experimental Results

Main Results

Table 1: Performance Comparison (Best in bold, second-best underlined)

Method	MI3 AUPRC	P12 AUPRC	HCC AUPRC
Random Forest	0.4367±0.0517	0.4805±0.0533	0.3934±0.0583
XGBoost	0.4553±0.0527	0.4980±0.0544	0.3887±0.0592
Transformer	0.5074±0.0510	0.5435±0.0560	0.4139±0.0571
SAnD	0.5463±0.0462	0.4615±0.0598	0.3769±0.0337
mTAN	0.5536±0.0359	0.4991±0.0521	0.4545±0.0264
STraTS	0.5886±0.0546	0.5206±0.0534	0.4270±0.0186
SUMMIT	0.6328±0.0277	0.5504±0.0563	0.4553±0.0577
MedFuse	0.6574±0.0270	0.5612±0.0558	0.4595±0.0556

Key Findings:

MedFuse achieves best primary metric AUPRC on all three datasets
Improvements over SUMMIT: MI3 +3.9%, P12 +2.0%, HCC +0.9%
AUROC and accuracy also achieve best on MI3 (0.9078 and 0.9153)

Ablation Studies

Table 2: Feature-Value Fusion Strategy Ablation (P12)

Method	AUPRC	AUROC	Accuracy
MuFuse (Multiplicative)	0.5612±0.0558	0.8686±0.0190	0.8837±0.0558
Additive	0.5317±0.0546	0.8549±0.0205	0.8754±0.0131
Concatenation	0.5291±0.0564	0.8518±0.0204	0.8779±0.0129

Conclusion: Multiplicative fusion shows 5.5% improvement in AUPRC over additive, confirming the effectiveness of value-conditioned multiplicative modulation

Impact of Dimension Splitting Factor k

Experimental Setup: Fix d=144, vary k (i.e., d'=d/k)

P12 Results:

k=1 (d'=144): AUPRC 0.539
k=9 (d'=16): AUPRC 0.561 (optimal)
k=144 (d'=1, equivalent to SCANE): AUPRC 0.548

Insights:

Moderate dimension splitting provides optimal balance
Too coarse (small k): Insufficient parameterization of value effects
Too fine (large k): Overfitting of feature-value interactions
Validates the flexible alignment design of broadcast Hadamard product

Cross-Dataset Transfer Learning

Experimental Protocol:

Pretrain on source dataset
Transfer only feature identity embeddings of overlapping features (F∩)
P12 and MI3 share 25 features (59.5% of P12, 18.9% of MI3)

Table 3: Cross-Dataset Transfer Results

Transfer Direction	AUPRC	Improvement
MI3→P12 (Large→Small)	0.5454	+1.7%
P12 Random Training	0.5361	baseline
MI3 Subsample→P12	0.5276	-1.6%
P12→MI3 (Small→Large)	0.6422	-3.3%
MI3 Random Training	0.6639	baseline

Key Findings:

Source dataset scale is critical: Large→small dataset shows positive transfer
Dataset identity is not the main factor: MI3 subsample→P12 still shows negative transfer
Feature embeddings capture reusable, cohort-agnostic semantics

Embedding Visualization

t-SNE Visualization (HCC Dataset):

Before Fusion: Clear clustering of tokens of the same feature type
After First Transformer Layer: Clustering characteristics preserved, confirming MuFuse robustness

1. Sequence Model Foundations

Classical RNNs: LSTM, GRU - establish baselines
Transformer: Capture long-range dependencies
Efficient Variants: Informer (sparse self-attention)

2. Medical Time Series Modeling

Imputation Methods: BRITS (joint learning of imputation and prediction)
Grid Resampling: SAnD (masked self-attention, requires regular grid)
Continuous-Time Attention: mTAN (directly handles irregular observations)

3. EVAT Paradigm

STraTS: Self-supervised triplet learning
SCANE/SUMMIT: Numerical scaling mechanism (SOTA)
This Work: Proves SCANE is a special case of MuFuse, provides more general framework

4. Fusion Operation Research

Chrysos et al. (2025): Advantages of Hadamard product in deep learning
This Work: First systematic application of multiplicative fusion to clinical EHR numerical modeling

Conclusions and Discussion

Main Conclusions

Multiplicative Fusion Outperforms Additive: MuFuse achieves feature-specific nonlinear interactions through value-conditioned modulation
Universal Imputation-Free Framework: MedFuse is effective in both ICU and chronic disease scenarios
Transferability: Learned feature embeddings support cross-dataset adaptation (requires sufficient source data scale)
Theoretical Unification: MuFuse generalizes SCANE, providing clearer design principles

Limitations

Computational Cost: 32-layer Transformer may limit real-time applications
Transfer Conditions: Cross-dataset transfer requires large-scale source datasets
Feature Overlap: Transfer depends on sufficient feature overlap (18.9%-59.5% in this work)
Interpretability: Clinical semantics of multiplicative interactions require further exploration
Multimodal Extension: Currently handles only numerical and categorical features, not text or images

Future Directions

Large-Scale Multimodal Pretraining: Extend to clinical notes and medical imaging
Causal Inference: Integrate counterfactual analysis to enhance interpretability
Trustworthy Clinical Decision Support: Deploy to real clinical environments
Efficient Architectures: Explore lightweight variants for resource-constrained scenarios
Improved Temporal Encoding: Research positional encodings better suited for irregular sampling

In-Depth Evaluation

Strengths

1. Method Innovation (★★★★★)

Solid Core Innovation: Multiplicative fusion has clear theoretical motivation (medical equifinality, interaction terms)
Generalizes SOTA: Elegantly proves SCANE is a special case (d'=1), providing unified framework
Flexible Design: Broadcast Hadamard product supports arbitrary dimension ratios

2. Experimental Sufficiency (★★★★★)

Diverse Datasets: Covers ICU (acute) and HCC (chronic) scenarios
Comprehensive Ablations: Fusion strategy, dimension factor, transfer learning across three dimensions
Statistical Rigor: Bootstrap confidence intervals, multi-metric evaluation
Visualization Analysis: t-SNE validates embedding quality

3. Writing Clarity (★★★★☆)

Clear structure, well-motivated exposition
Precise mathematical expressions (Equations 4-11)
Detailed appendix (hyperparameters, dataset statistics, additional experiments)
Minor limitation: Some clinical terminology could benefit from more explanation

4. Practical Value (★★★★☆)

No imputation needed, reduces preprocessing complexity
Code not yet public (under submission), but method description is detailed
Relatively high computational cost (32-layer Transformer)

Weaknesses

1. Method Limitations

Temporal Encoding Contradiction: Acknowledges multiplicative fusion unsuitable for time encoding, but lacks deep theoretical explanation
Dimension Selection: Optimal k value depends on dataset, lacks automatic selection mechanism
Categorical Feature Handling: Simple concatenation + linear transformation, insufficient exploration of multiplicative fusion potential

2. Experimental Deficiencies

Limited Transfer Experiments: Only tested between two ICU datasets, HCC not involved
Low Feature Overlap: Only 18.9% feature overlap on MI3 side, limits transfer potential assessment
Missing Computational Cost Analysis: No reported training time or memory consumption
Hyperparameter Sensitivity: Requires significant layer adjustment across datasets (1-32 layers)

3. Insufficient Analysis

Feature Interaction Visualization: Lacks specific clinical feature interaction analysis
Failure Case Analysis: No discussion of model prediction errors
Incomplete SCANE Comparison: While proving special case relationship, lacks direct performance comparison at different d' settings

4. Reproducibility Issues

Code Not Public: Affects result verification
Private Datasets: HCC dataset cannot be publicly accessed
Random Seeds: Not explicitly stated whether fixed

Impact Assessment

Contribution to Field (★★★★☆)

Theoretical Contribution: Establishes theoretical foundation for multiplicative fusion in EHR modeling
Method Contribution: Provides universal framework, extensible to other irregular time series
Empirical Contribution: Establishes new SOTA on standard benchmarks

Practical Value (★★★☆☆)

Advantages: No imputation needed, directly handles irregular data
Limitations: High computational cost, requires large-scale source datasets for transfer
Applicable Scenarios: Suitable for research institutions and large medical centers with sufficient computational resources

Reproducibility (★★★☆☆)

Detailed Method Description: Clear formulas and architecture
Missing Code: Reduces reproducibility
Partial Data Availability: P12 and MI3 public, HCC private

Applicable Scenarios

Best Suited For

High Missing Rate Scenarios (>70%): Advantages of no imputation clearly evident
Irregular Sampling: ICU monitoring, outpatient follow-ups with asynchronous data
Numerical Feature Dominant: Laboratory tests, vital signs and other continuous measurements
Pretraining Needs: Can leverage large-scale source datasets

Less Suitable For

Real-Time Prediction: 32-layer Transformer inference latency relatively high
Small Sample Scenarios: Transfer learning requires large-scale source data
Pure Categorical Features: Multiplicative fusion advantages not evident
Resource-Constrained Environments: Edge devices, mobile health applications

Improvement Suggestions

Adaptive Dimension Selection: Develop methods to automatically determine k (e.g., neural architecture search)
Lightweight Variants: Explore knowledge distillation or pruning to reduce computational cost
Multimodal Extension: Integrate clinical notes and medical imaging
Interpretability Enhancement: Provide clinical semantic explanations for feature interactions
Public Code and Models: Promote community verification and application

Selected References

Huang et al. (2024): SCANE/SUMMIT - improved SOTA baseline in this work
Chrysos et al. (2025): Survey on Hadamard product in deep learning
Tipirneni & Reddy (2022): STraTS - representative work of EVAT paradigm
Shukla & Marlin (2021): mTAN - continuous-time attention mechanism
Vaswani et al. (2017): Transformer - backbone architecture in this work
Johnson et al. (2016): MIMIC-III database - key evaluation dataset

Summary

MedFuse is a paper with substantial contributions to clinical time series modeling. Its core innovation—multiplicative embedding fusion (MuFuse)—not only elegantly generalizes existing SOTA methods in theory but also achieves consistent performance improvements on multiple real-world datasets. The paper's experimental design is comprehensive, systematically validating method effectiveness from main performance comparisons through ablation studies, dimension analysis, and transfer learning.

Particularly commendable is the paper's insight into medical equifinality—naturally modeling the phenomenon where different abnormal deviations correspond to the same clinical risk through the masking effect of multiplicative fusion. This demonstrates the authors' deep understanding of the clinical domain.

However, the paper has some limitations: relatively high computational cost, limited transfer learning experiments, and lack of code release. Nevertheless, MedFuse provides a powerful and universal framework for irregular clinical time series modeling, with significant implications for advancing medical AI. Future work on multimodal extension, interpretability, and practical clinical deployment is anticipated.

Recommendation Score: 8.5/10