Learning Optimal Prompt Ensemble for Multi-source Visual Prompt Transfer
Zhang, Cao, Wu et al.
Prompt tuning has emerged as a lightweight strategy for adapting foundation models to downstream tasks, particularly for resource-constrained systems. As pre-trained prompts become valuable assets, combining multiple source prompts offers a promising approach to enhance generalization for new tasks by leveraging complementary knowledge. However, naive aggregation often overlooks different source prompts have different contribution potential to the target task. To address this, we propose HGPrompt, a dynamic framework that learns optimal ensemble weights. These weights are optimized by jointly maximizing an information-theoretic metric for transferability and minimizing gradient conflicts via a novel regularization strategy. Specifically, we propose a differentiable prompt transferability metric to captures the discriminability of prompt-induced features on the target task. Meanwhile, HGPrompt match the gradient variances with respect to different source prompts based on Hessian and Fisher Information, ensuring stable and coherent knowledge transfer while suppressing gradient conflicts among them. Extensive experiments on the large-scale VTAB benchmark demonstrate the state-of-the-art performance of HGPrompt, validating its effectiveness in learning an optimal ensemble for effective multi-source prompt transfer.
academic
Learning Optimal Prompt Ensemble for Multi-source Visual Prompt Transfer
This paper proposes the HGPrompt framework for multi-source visual prompt transfer tasks. The method learns optimal ensemble weights through joint optimization of information-theoretic transferability metrics and gradient conflict minimization regularization. Specifically, a differentiable prompt transferability metric is proposed to capture the discriminability of prompt-induced features on target tasks, while matching gradient variance across different source prompts based on Hessian and Fisher information to ensure stable and consistent knowledge transfer while suppressing gradient conflicts. Experiments on the large-scale VTAB benchmark validate the effectiveness of HGPrompt.
With the development of visual foundation models, prompt tuning has become a lightweight strategy for adapting to downstream tasks. The core challenge faced by existing methods is: how to effectively aggregate multiple source prompts to enhance generalization capability on new tasks.
Resource Efficiency Requirements: Full model fine-tuning becomes impractical on large-scale pre-trained models, while prompt tuning achieves competitive performance by updating only 0.4% of parameters
Prompt Asset Value: Pre-trained prompts have become valuable knowledge assets, and combinations of multi-source prompts can leverage complementary knowledge
Limitations of Existing Methods: Simple concatenation or averaging aggregation ignores the varying contributions of different source prompts to target tasks, potentially leading to representation collapse
Proposes HGPrompt Framework: The first theoretically reliable framework for dynamically learning optimal prompt weights by evaluating the transferability of aggregated prompt-induced features
Information-Theoretic Transferability Metric: A differentiable prompt transferability metric based on H-score, providing explicit and interpretable contribution quantification
Gradient Alignment Regularization: An innovative gradient variance matching objective that addresses gradient conflicts among multi-source prompts
SOTA Performance: Achieves state-of-the-art performance on the VTAB benchmark with average accuracy of 60.3%
Given κ source tasks S = {Sᵢ}ᵏᵢ₌₁ and their corresponding optimized prompts {Pᵢ}ᵏᵢ₌₁, the goal is to construct a target prompt P_T for new task T through optimal combination of source prompts. Let M ≤ κ be the number of selected source prompts, with weights α = (α₁,...,αₘ) satisfying ∑ᵢαᵢ = 1 and αᵢ ≥ 0.
For pre-trained Transformers, m learnable prompt tokens P = p₁,...,pₘ ∈ ℝᵐˣᵈ are introduced. Given patch embeddings E(X) ∈ ℝⁿˣᵈ of input image X, the combined input sequence is P;E(X) ∈ ℝ⁽ᵐ⁺ⁿ⁾ˣᵈ.
Definition 1: Given input data x, labels y, and feature extractor f(x), the one-sided H-score is defined as:
H(f) = tr(cov(f(X))⁻¹cov(E_{P_{X|Y}}[f(X)|Y]))
This metric has an intuitive interpretation: high H-score indicates greater inter-class discriminability cov(Ef(X)|Y) and minimal feature redundancy tr(cov(f(X))).
Definition 2: Optimal feature weights are determined by maximizing the H-score of weighted feature sum:
α* = argmax_α H(∑ⱼαⱼ·f_{Pⱼ}) s.t. ∑ⱼαⱼ = 1
Theorem 1: H-score is a convex quadratic form in weights α, guaranteeing reliable solution of the optimization problem.
Ensemble Evaluation vs. Isolated Evaluation: Unlike traditional methods that independently evaluate each prompt, this work evaluates the overall transferability of aggregated prompts
Gradient Conflict Resolution: By leveraging theoretical insights from Hessian and Fisher information, gradient variance matching is designed to reduce optimization inconsistency
As the number of source prompts increases from 3 to 11, HGPrompt demonstrates stronger performance advantages over PANDA and SPoT, validating the method's effectiveness on large-scale prompt collections.
t-SNE visualization shows that features generated by HGPrompt exhibit better class discriminability, with same-class objects forming tight clusters with clear boundaries.
Parameter-Efficient Learning: Houlsby et al. (2019), Hu et al. (2021)
Transferability Assessment: Bao et al. (2019), You et al. (2021)
Multi-Task Learning: Yu et al. (2020), Rame et al. (2022)
Vision Transformers: Dosovitskiy (2020), Jia et al. (2022)
This paper makes important contributions to the multi-source visual prompt transfer domain, addressing key challenges of existing methods through theoretical innovation and technical breakthroughs, providing new research directions for parameter-efficient transfer learning.