2025-11-18T18:37:13.503826

Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion

Zhang, Cheng
Thanks to the recent achievements in task-driven image quality enhancement (IQE) models like ESTR, the image enhancement model and the visual recognition model can mutually enhance each other's quantitation while producing high-quality processed images that are perceivable by our human vision systems. However, existing task-driven IQE models tend to overlook an underlying fact -- different levels of vision tasks have varying and sometimes conflicting requirements of image features. To address this problem, this paper proposes a generalized gradient promotion (GradProm) training strategy for task-driven IQE of medical images. Specifically, we partition a task-driven IQE system into two sub-models, i.e., a mainstream model for image enhancement and an auxiliary model for visual recognition. During training, GradProm updates only parameters of the image enhancement model using gradients of the visual recognition model and the image enhancement model, but only when gradients of these two sub-models are aligned in the same direction, which is measured by their cosine similarity. In case gradients of these two sub-models are not in the same direction, GradProm only uses the gradient of the image enhancement model to update its parameters. Theoretically, we have proved that the optimization direction of the image enhancement model will not be biased by the auxiliary visual recognition model under the implementation of GradProm. Empirically, extensive experimental results on four public yet challenging medical image datasets demonstrated the superior performance of GradProm over existing state-of-the-art methods.
academic

Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion

Basic Information

  • Paper ID: 2501.01114
  • Title: Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion
  • Authors: Dong Zhang, Kwang-Ting Cheng
  • Category: cs.CV (Computer Vision)
  • Publication Date/Venue: arXiv preprint, January 2, 2025
  • Paper Link: https://arxiv.org/abs/2501.01114

Abstract

This paper proposes a generalized gradient promotion (GradProm) training strategy for task-driven medical image quality enhancement (IQE). While existing task-driven IQE models (such as ESTR) achieve mutual promotion between image enhancement and visual recognition models, they overlook an important fact: different levels of visual tasks have different and sometimes conflicting feature requirements. To address this issue, the paper divides the task-driven IQE system into two sub-models: a primary image enhancement model and an auxiliary visual recognition model. GradProm updates the image enhancement model parameters using gradients from both sub-models only when their gradient directions are consistent; otherwise, it uses only the image enhancement model's own gradients. The method is theoretically proven to ensure that the optimization direction of the image enhancement model is not biased by the auxiliary visual recognition model. Experimental results on four public medical image datasets validate its superiority.

Research Background and Motivation

Problem Definition

Medical image analysis plays an increasingly important role in modern medical systems, helping physicians visualize internal anatomical structures and assess disease progression. Image quality is critical for medical image analysis, as higher quality images typically yield more accurate recognition performance.

Limitations of Existing Methods

  1. Issues with Perception-Oriented Approaches: Traditional perception-oriented medical image processing methods primarily pursue high-quality performance aligned with human visual perception. However, enhanced visual perception quality does not necessarily translate to beneficial information for downstream visual recognition models.
  2. Deficiencies in Task-Driven Methods: While existing task-driven IQE methods jointly train image enhancement and visual recognition models, they overlook an important fact—different levels of computer vision tasks have different and sometimes conflicting feature requirements.

Research Motivation

As illustrated in Figure 2, given the same input image, denoising tasks focus on all image regions, semantic segmentation tasks focus on foreground object regions, while diagnostic tasks focus on discriminative local regions of foreground objects. This inconsistency in feature requirements creates potential conflicts between upstream image enhancement models and downstream visual recognition models, affecting performance.

Core Contributions

  1. Proposes a new paradigm for task-driven medical IQE: Explicitly divides the system into primary image enhancement and auxiliary visual recognition sub-models
  2. Designs the GradProm training strategy: A simple yet effective generalized training strategy that dynamically trains both sub-models and achieves continuous performance improvement without requiring additional data or network architecture modifications
  3. Provides theoretical proof: Demonstrates that GradProm converges to local optima without bias from the auxiliary visual recognition model
  4. Comprehensive experimental validation: Conducts extensive experiments on four public medical image datasets, demonstrating that GradProm achieves state-of-the-art performance on IQE tasks

Method Details

Task Definition

Task-driven medical IQE is essentially an image enhancement task where the input is a low-quality image X, with the corresponding high-quality image Y as the label. The training process aims to make X processed through the image enhancement model IP and visual recognition model VR as close as possible to Y.

Mathematical Formulation of Traditional Methods

The traditional joint training total loss is:

L_total = L_IP + λL_VR

where L_IP is the image enhancement loss, L_VR is the visual recognition loss, and λ is a balancing hyperparameter.

Core Concept of GradProm

The core idea of GradProm is to explicitly divide the task-driven medical IQE system into:

  • Primary Model: Image enhancement model IP (parameters θ)
  • Auxiliary Model: Visual recognition model VR (parameters φ)

Gradient Promotion Strategy

GradProm dynamically adjusts the training objective based on the cosine similarity s = cos(G_IP, G_VR) of gradients from the two sub-models:

Case 1: When s ≥ 0 (gradient directions consistent)

G_T = [∇_θ(L_IP(θ) + λL_VR(φ)); ∇_φL_VR(φ)]

Case 2: When s < 0 (gradient directions inconsistent)

G_T = [∇_θ(L_IP(θ)); ∇_φL_VR(φ)]

Theoretical Analysis

Lemma 3.1: GradProm achieves local minimum through the following update rule:

θ^(t+1)_T := θ^t_T - α_t(G^t_IP + G^t_VR * max(0, cos(G^t_IP, G^t_VR)))

Proof Highlights: By proving that the update direction has a non-negative inner product with the primary model's gradient, the correctness of the optimization direction is ensured, preventing bias introduction from the auxiliary model.

Experimental Setup

Datasets

  1. ISIC 2018: Skin lesion dataset with 2,594 RGB images at 600×450 resolution
  2. COVID-CT: CT dataset with 349 COVID-19 positive and 397 negative CT images
  3. Lizard: 238 PNG images containing 6 nuclear cell categories
  4. CAMUS: Echocardiography dataset with 2D ultrasound images from 500 patients

Experimental Tasks

  • Image Enhancement Tasks: Denoising, super-resolution
  • Visual Recognition Tasks: Diagnosis (classification), semantic segmentation

Baseline Methods

  • Benchmark-i: Image enhancement using only SR-ResNet
  • Benchmark-ii/iii: Pure ResNet for diagnosis/UNet for segmentation
  • Joint Training: Joint training strategy
  • Frozen-params Training: Training strategy with frozen VR parameters (ESTR method)

Evaluation Metrics

  • Image Quality: PSNR, SSIM
  • Recognition Performance: Accuracy (diagnosis), mIoU (segmentation)

Experimental Results

Main Results

Denoising Results on ISIC 2018 Dataset

Performance comparison at different noise levels (Tables 1 and 2):

Noise σ=0.1PSNR↑SSIM↑
Frozen-params32.1520.906
GradProm33.3830.915

GradProm outperforms baseline methods at various noise levels, achieving 1.231 PSNR and 0.009 SSIM improvements over the Frozen-params method at σ=0.1.

Comparison with State-of-the-Art Methods

Table 5 shows comparisons with SOTA methods on ISIC 2018:

Methodσ=0.1 PSNRσ=0.2 PSNRσ=0.3 PSNR
ESTR (ResNet-101)33.72325.92520.163
ADAP34.85824.92620.373
GradProm (ResNet-101)36.17328.02423.703

Ablation Studies

Comparison of Different Training Strategies

Experimental results show that GradProm outperforms joint training and frozen parameter strategies in both supervised and unsupervised settings.

Multi-task Learning Analysis

Simultaneously using diagnosis and segmentation as auxiliary tasks did not improve performance; instead, performance decreased, confirming the hypothesis of inconsistent feature requirements across different visual tasks.

Challenging Scenario Testing

Even in highly challenging scenarios with composite noise (Gaussian noise + Poisson noise + Gaussian blur), GradProm still achieves 0.384 PSNR improvement.

Cross-Domain Generalization Experiments

In cross-domain experiments trained on ISIC 2018 and tested on Lizard, GradProm achieves PSNR/SSIM improvements of 13.273/0.325 and 13.825/0.458 over ESTR in unsupervised and supervised settings, respectively.

Qualitative Analysis

  • Visualization Results: Images generated by GradProm better preserve foreground object integrity while removing noise
  • Class Activation Map Analysis: GradProm's CAM focuses more on foreground object regions, validating the effectiveness of auxiliary tasks

Medical Image Quality Enhancement

Existing medical IQE tasks can be categorized into two types:

  1. Image Restoration: Improving the quality of degraded or noisy medical images
  2. Image Enhancement: Improving image contrast and sharpening image details

Multi-task Learning and Auxiliary Learning

  • Multi-task Learning: Leveraging useful knowledge from related tasks to improve overall performance of all involved tasks
  • Auxiliary Learning: When multiple tasks have different importance levels, tasks are divided into primary and auxiliary tasks

This paper frames task-driven medical image quality enhancement as an auxiliary learning paradigm, where image processing is the primary task and image recognition is the auxiliary task.

Conclusions and Discussion

Main Conclusions

  1. GradProm effectively resolves the feature requirement conflicts between different models in task-driven IQE
  2. Through dynamic gradient selection mechanisms, it ensures that the primary image enhancement model is not biased by the auxiliary model
  3. Achieves state-of-the-art performance on multiple medical image datasets
  4. The method demonstrates good generalization, applicable to different medical imaging modalities

Limitations

  1. Computational Overhead: While inference has no additional overhead, training requires computing gradient similarity
  2. Applicable Scope: Primarily targets the medical imaging domain; effectiveness in other domains requires further verification
  3. Extreme Scenarios: Performance improvements are limited when image quality is severely degraded

Future Directions

  1. Extended Applications: Extend GradProm to other task-driven training processes, such as multi-objective learning and task-driven data augmentation
  2. Medical Applications: Explore applications in other medical image analysis tasks such as registration and reconstruction
  3. Technical Integration: Investigate combinations of GradProm with transfer learning, domain adaptation, and other techniques

In-Depth Evaluation

Strengths

  1. Deep Problem Insight: Accurately identifies the core problem of existing task-driven methods—conflicts in feature requirements across different tasks
  2. Clever Method Design: Effectively resolves gradient conflicts through simple yet effective gradient cosine similarity
  3. Solid Theoretical Foundation: Provides rigorous mathematical proofs ensuring theoretical correctness
  4. Comprehensive Experiments: Conducts thorough validation across multiple datasets, tasks, and settings
  5. High Practical Value: Requires no network architecture modifications or inference overhead, facilitating practical application

Weaknesses

  1. Gradient Computation Overhead: Requires additional gradient similarity computation, increasing training time
  2. Simplistic Threshold Setting: Using only 0 as the threshold may be too coarse; finer-grained strategies could yield better results
  3. Limited Cross-Domain Validation: While generalization across different medical imaging modalities is validated, cross-domain validation is insufficient
  4. Limited Baseline Selection: Some comparison methods may not be the most recent SOTA approaches

Impact

  1. Academic Value: Provides new insights and methods for the task-driven learning field
  2. Practical Value: Holds important application value for medical image processing
  3. Reproducibility: Clear method description and relatively simple implementation ensure good reproducibility
  4. Inspirational Significance: The gradient conflict resolution approach may inspire research on other multi-task learning problems

Applicable Scenarios

  1. Medical Image Processing: Quality enhancement tasks across various medical imaging modalities
  2. Multi-task Learning: Scenarios with primary-auxiliary task relationships where task conflicts may exist
  3. Image Enhancement: Applications requiring image quality improvement combined with downstream tasks
  4. Auxiliary Learning: Scenarios requiring auxiliary task utilization to enhance primary task performance

References

The paper cites abundant related work, primarily including:

  1. ESTR 1 - Representative work in task-driven image quality enhancement
  2. ResNet 6 - Classical deep learning architecture
  3. UNet 39 - Classical method for medical image segmentation
  4. Multiple papers on medical image datasets 40-43

Overall Assessment: This is a high-quality computer vision paper that proposes an innovative solution to a key problem in task-driven medical image quality enhancement. The method is simple and effective, with solid theoretical foundations and comprehensive experimental validation, demonstrating significant academic and practical value.