2025-11-18T18:37:13.503826

Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion

Zhang, Cheng

Thanks to the recent achievements in task-driven image quality enhancement (IQE) models like ESTR, the image enhancement model and the visual recognition model can mutually enhance each other's quantitation while producing high-quality processed images that are perceivable by our human vision systems. However, existing task-driven IQE models tend to overlook an underlying fact -- different levels of vision tasks have varying and sometimes conflicting requirements of image features. To address this problem, this paper proposes a generalized gradient promotion (GradProm) training strategy for task-driven IQE of medical images. Specifically, we partition a task-driven IQE system into two sub-models, i.e., a mainstream model for image enhancement and an auxiliary model for visual recognition. During training, GradProm updates only parameters of the image enhancement model using gradients of the visual recognition model and the image enhancement model, but only when gradients of these two sub-models are aligned in the same direction, which is measured by their cosine similarity. In case gradients of these two sub-models are not in the same direction, GradProm only uses the gradient of the image enhancement model to update its parameters. Theoretically, we have proved that the optimization direction of the image enhancement model will not be biased by the auxiliary visual recognition model under the implementation of GradProm. Empirically, extensive experimental results on four public yet challenging medical image datasets demonstrated the superior performance of GradProm over existing state-of-the-art methods.

academic

Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion

Basic Information

Paper ID: 2501.01114
Title: Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion
Authors: Dong Zhang, Kwang-Ting Cheng
Category: cs.CV (Computer Vision)
Publication Date/Venue: arXiv preprint, January 2, 2025
Paper Link: https://arxiv.org/abs/2501.01114

Abstract

This paper proposes a generalized gradient promotion (GradProm) training strategy for task-driven medical image quality enhancement (IQE). While existing task-driven IQE models (such as ESTR) achieve mutual promotion between image enhancement and visual recognition models, they overlook an important fact: different levels of visual tasks have different and sometimes conflicting feature requirements. To address this issue, the paper divides the task-driven IQE system into two sub-models: a primary image enhancement model and an auxiliary visual recognition model. GradProm updates the image enhancement model parameters using gradients from both sub-models only when their gradient directions are consistent; otherwise, it uses only the image enhancement model's own gradients. The method is theoretically proven to ensure that the optimization direction of the image enhancement model is not biased by the auxiliary visual recognition model. Experimental results on four public medical image datasets validate its superiority.

Research Background and Motivation

Problem Definition

Medical image analysis plays an increasingly important role in modern medical systems, helping physicians visualize internal anatomical structures and assess disease progression. Image quality is critical for medical image analysis, as higher quality images typically yield more accurate recognition performance.

Limitations of Existing Methods

Issues with Perception-Oriented Approaches: Traditional perception-oriented medical image processing methods primarily pursue high-quality performance aligned with human visual perception. However, enhanced visual perception quality does not necessarily translate to beneficial information for downstream visual recognition models.
Deficiencies in Task-Driven Methods: While existing task-driven IQE methods jointly train image enhancement and visual recognition models, they overlook an important fact—different levels of computer vision tasks have different and sometimes conflicting feature requirements.

Research Motivation

As illustrated in Figure 2, given the same input image, denoising tasks focus on all image regions, semantic segmentation tasks focus on foreground object regions, while diagnostic tasks focus on discriminative local regions of foreground objects. This inconsistency in feature requirements creates potential conflicts between upstream image enhancement models and downstream visual recognition models, affecting performance.

Core Contributions

Proposes a new paradigm for task-driven medical IQE: Explicitly divides the system into primary image enhancement and auxiliary visual recognition sub-models
Designs the GradProm training strategy: A simple yet effective generalized training strategy that dynamically trains both sub-models and achieves continuous performance improvement without requiring additional data or network architecture modifications
Provides theoretical proof: Demonstrates that GradProm converges to local optima without bias from the auxiliary visual recognition model
Comprehensive experimental validation: Conducts extensive experiments on four public medical image datasets, demonstrating that GradProm achieves state-of-the-art performance on IQE tasks

Method Details

Task Definition

Task-driven medical IQE is essentially an image enhancement task where the input is a low-quality image X, with the corresponding high-quality image Y as the label. The training process aims to make X processed through the image enhancement model IP and visual recognition model VR as close as possible to Y.

Mathematical Formulation of Traditional Methods

The traditional joint training total loss is:

L_total = L_IP + λL_VR

where L_IP is the image enhancement loss, L_VR is the visual recognition loss, and λ is a balancing hyperparameter.

Core Concept of GradProm

The core idea of GradProm is to explicitly divide the task-driven medical IQE system into:

Primary Model: Image enhancement model IP (parameters θ)
Auxiliary Model: Visual recognition model VR (parameters φ)

Gradient Promotion Strategy

GradProm dynamically adjusts the training objective based on the cosine similarity s = cos(G_IP, G_VR) of gradients from the two sub-models:

Case 1: When s ≥ 0 (gradient directions consistent)

G_T = [∇_θ(L_IP(θ) + λL_VR(φ)); ∇_φL_VR(φ)]

Case 2: When s < 0 (gradient directions inconsistent)

G_T = [∇_θ(L_IP(θ)); ∇_φL_VR(φ)]

Theoretical Analysis

Lemma 3.1: GradProm achieves local minimum through the following update rule:

θ^(t+1)_T := θ^t_T - α_t(G^t_IP + G^t_VR * max(0, cos(G^t_IP, G^t_VR)))

Proof Highlights: By proving that the update direction has a non-negative inner product with the primary model's gradient, the correctness of the optimization direction is ensured, preventing bias introduction from the auxiliary model.

Experimental Setup

Datasets

ISIC 2018: Skin lesion dataset with 2,594 RGB images at 600×450 resolution
COVID-CT: CT dataset with 349 COVID-19 positive and 397 negative CT images
Lizard: 238 PNG images containing 6 nuclear cell categories
CAMUS: Echocardiography dataset with 2D ultrasound images from 500 patients

Experimental Tasks

Image Enhancement Tasks: Denoising, super-resolution
Visual Recognition Tasks: Diagnosis (classification), semantic segmentation

Baseline Methods

Benchmark-i: Image enhancement using only SR-ResNet
Benchmark-ii/iii: Pure ResNet for diagnosis/UNet for segmentation
Joint Training: Joint training strategy
Frozen-params Training: Training strategy with frozen VR parameters (ESTR method)

Evaluation Metrics

Image Quality: PSNR, SSIM
Recognition Performance: Accuracy (diagnosis), mIoU (segmentation)

Experimental Results

Main Results

Denoising Results on ISIC 2018 Dataset

Performance comparison at different noise levels (Tables 1 and 2):

Noise σ=0.1	PSNR↑	SSIM↑
Frozen-params	32.152	0.906
GradProm	33.383	0.915

GradProm outperforms baseline methods at various noise levels, achieving 1.231 PSNR and 0.009 SSIM improvements over the Frozen-params method at σ=0.1.

Comparison with State-of-the-Art Methods

Table 5 shows comparisons with SOTA methods on ISIC 2018:

Method	σ=0.1 PSNR	σ=0.2 PSNR	σ=0.3 PSNR
ESTR (ResNet-101)	33.723	25.925	20.163
ADAP	34.858	24.926	20.373
GradProm (ResNet-101)	36.173	28.024	23.703

Ablation Studies

Comparison of Different Training Strategies

Experimental results show that GradProm outperforms joint training and frozen parameter strategies in both supervised and unsupervised settings.

Multi-task Learning Analysis

Simultaneously using diagnosis and segmentation as auxiliary tasks did not improve performance; instead, performance decreased, confirming the hypothesis of inconsistent feature requirements across different visual tasks.

Challenging Scenario Testing

Even in highly challenging scenarios with composite noise (Gaussian noise + Poisson noise + Gaussian blur), GradProm still achieves 0.384 PSNR improvement.

Cross-Domain Generalization Experiments

In cross-domain experiments trained on ISIC 2018 and tested on Lizard, GradProm achieves PSNR/SSIM improvements of 13.273/0.325 and 13.825/0.458 over ESTR in unsupervised and supervised settings, respectively.

Qualitative Analysis

Visualization Results: Images generated by GradProm better preserve foreground object integrity while removing noise
Class Activation Map Analysis: GradProm's CAM focuses more on foreground object regions, validating the effectiveness of auxiliary tasks

Medical Image Quality Enhancement

Existing medical IQE tasks can be categorized into two types:

Image Restoration: Improving the quality of degraded or noisy medical images
Image Enhancement: Improving image contrast and sharpening image details

Multi-task Learning and Auxiliary Learning

Multi-task Learning: Leveraging useful knowledge from related tasks to improve overall performance of all involved tasks
Auxiliary Learning: When multiple tasks have different importance levels, tasks are divided into primary and auxiliary tasks

This paper frames task-driven medical image quality enhancement as an auxiliary learning paradigm, where image processing is the primary task and image recognition is the auxiliary task.

Conclusions and Discussion

Main Conclusions

GradProm effectively resolves the feature requirement conflicts between different models in task-driven IQE
Through dynamic gradient selection mechanisms, it ensures that the primary image enhancement model is not biased by the auxiliary model
Achieves state-of-the-art performance on multiple medical image datasets
The method demonstrates good generalization, applicable to different medical imaging modalities

Limitations

Computational Overhead: While inference has no additional overhead, training requires computing gradient similarity
Applicable Scope: Primarily targets the medical imaging domain; effectiveness in other domains requires further verification
Extreme Scenarios: Performance improvements are limited when image quality is severely degraded

Future Directions

Extended Applications: Extend GradProm to other task-driven training processes, such as multi-objective learning and task-driven data augmentation
Medical Applications: Explore applications in other medical image analysis tasks such as registration and reconstruction
Technical Integration: Investigate combinations of GradProm with transfer learning, domain adaptation, and other techniques

In-Depth Evaluation

Strengths

Deep Problem Insight: Accurately identifies the core problem of existing task-driven methods—conflicts in feature requirements across different tasks
Clever Method Design: Effectively resolves gradient conflicts through simple yet effective gradient cosine similarity
Solid Theoretical Foundation: Provides rigorous mathematical proofs ensuring theoretical correctness
Comprehensive Experiments: Conducts thorough validation across multiple datasets, tasks, and settings
High Practical Value: Requires no network architecture modifications or inference overhead, facilitating practical application

Weaknesses

Gradient Computation Overhead: Requires additional gradient similarity computation, increasing training time
Simplistic Threshold Setting: Using only 0 as the threshold may be too coarse; finer-grained strategies could yield better results
Limited Cross-Domain Validation: While generalization across different medical imaging modalities is validated, cross-domain validation is insufficient
Limited Baseline Selection: Some comparison methods may not be the most recent SOTA approaches

Impact

Academic Value: Provides new insights and methods for the task-driven learning field
Practical Value: Holds important application value for medical image processing
Reproducibility: Clear method description and relatively simple implementation ensure good reproducibility
Inspirational Significance: The gradient conflict resolution approach may inspire research on other multi-task learning problems

Applicable Scenarios

Medical Image Processing: Quality enhancement tasks across various medical imaging modalities
Multi-task Learning: Scenarios with primary-auxiliary task relationships where task conflicts may exist
Image Enhancement: Applications requiring image quality improvement combined with downstream tasks
Auxiliary Learning: Scenarios requiring auxiliary task utilization to enhance primary task performance

References

The paper cites abundant related work, primarily including:

ESTR 1 - Representative work in task-driven image quality enhancement
ResNet 6 - Classical deep learning architecture
UNet 39 - Classical method for medical image segmentation
Multiple papers on medical image datasets 40-43

Overall Assessment: This is a high-quality computer vision paper that proposes an innovative solution to a key problem in task-driven medical image quality enhancement. The method is simple and effective, with solid theoretical foundations and comprehensive experimental validation, demonstrating significant academic and practical value.