2025-11-22T06:43:16.272980

Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning

Lee, Lee, Kwak
We introduce the Deep Edge Filter, a novel approach that applies high-pass filtering to deep neural network features to improve model generalizability. Our method is motivated by our hypothesis that neural networks encode task-relevant semantic information in high-frequency components while storing domain-specific biases in low-frequency components of deep features. By subtracting low-pass filtered outputs from original features, our approach isolates generalizable representations while preserving architectural integrity. Experimental results across diverse domains such as Vision, Text, 3D, and Audio demonstrate consistent performance improvements regardless of model architecture and data modality. Analysis reveals that our method induces feature sparsification and effectively isolates high-frequency components, providing empirical validation of our core hypothesis. The code is available at https://github.com/dongkwani/DeepEdgeFilter.
academic

Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning

Basic Information

  • Paper ID: 2510.13865
  • Title: Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning
  • Authors: Dongkwan Lee, Junhoo Lee, Nojun Kwak (Seoul National University)
  • Classification: cs.LG cs.AI
  • Publication Time/Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
  • Paper Link: https://arxiv.org/abs/2510.13865
  • Code Link: https://github.com/dongkwani/DeepEdgeFilter

Abstract

This paper proposes Deep Edge Filter, a novel method that applies high-pass filtering to deep neural network features to improve model generalization. The method is based on the hypothesis that neural networks encode task-relevant semantic information in high-frequency components of deep features, while storing domain-specific biases in low-frequency components. By subtracting low-pass filtered outputs from original features, the method isolates generalizable representations while maintaining architectural integrity. Experimental results across multiple domains including vision, text, 3D, and audio demonstrate consistent performance improvements regardless of model architecture and data modality. Analysis shows that the method induces feature sparsification and effectively separates high-frequency components, providing empirical validation of the core hypothesis.

Research Background and Motivation

Problem Definition

A core challenge faced by deep learning models is their vulnerability to perturbations and domain shifts. The increased reliance on surface-level low-level texture features acquired during training further exacerbates their vulnerability to perturbations, particularly evident in adversarial attacks and domain adaptation.

Research Motivation

The authors observe that traditional edge filters have long been used in image processing as classical techniques for effectively capturing relevant information, providing strong priors robust to various noise types while effectively extracting semantic information. However, this knowledge appears to have been forgotten in modern deep learning.

Limitations of Existing Approaches

The primary reasons why past attempts to integrate edge detection techniques into deep learning failed include:

  1. Applying edge filters to images, while providing robustness to perturbations, results in loss of fine-grained image details
  2. Classical edge detection is limited to the image domain and is difficult to apply universally in modern deep learning that handles diverse data modalities

Contributions of This Work

This paper generalizes the concept of edge filters to deep features, which can be applied directly to deeper layers rather than input layers, combining the advantages of traditional edge filters and deep learning to construct models robust to perturbations and domain shifts.

Core Contributions

  1. Proposes Deep Edge Filter: A filter constructed based on human intuition that can be applied to features of deep neural networks in a modality-agnostic manner, promoting extraction of generalizable features
  2. Cross-Architecture and Cross-Modality Validation: Proposes Edge Filters for CNN and ViT architectures and empirically demonstrates the filter's effectiveness on generalization-critical tasks across multiple modalities including images, text, 3D, and audio
  3. Theoretical Analysis and Empirical Validation: Analyzes experimental results from the perspectives of layer sparsity and frequency decomposition, and provides extensive ablation studies on Deep Feature Edge Filters

Method Details

Core Hypothesis

The authors propose a key hypothesis: deep networks encode task-relevant semantic features in high-frequency components and domain-specific biases in low-frequency components. If this hypothesis holds, generalizing Edge Filter (essentially functioning as a high-pass filter) should help isolate generalizable features.

Deep Edge Filter Definition

The Edge Filter is defined as the residual obtained by subtracting the low-pass filtering (LPF) result from the original deep feature h:

F_edge(h) = h - LPF(h)

where LPF represents a low-pass filter applied to h, such as mean, median, or Gaussian kernels.

Feature Decomposition Theory

Let h ∈ R^d be a feature vector from a hidden layer of a deep network. Assume the feature can be additively decomposed as:

h = h_sem + h_dom

where:

  • h_sem encodes generalizable, task-relevant semantic features
  • h_dom represents domain-specific biases, such as illumination, resolution, or background texture

Sparse Coding Perspective

Under the proposed feature decomposition and frequency hypothesis:

LPF(h) ≈ h_dom ⇒ F_edge(h) ≈ h_sem

This approach of refining features through frequency filtering strongly resonates with sparse coding principles. By removing low-frequency, domain-specific redundancy from h through edge filtering, the method essentially simplifies the signal that needs to be represented.

Architecture Adaptation

  • CNN Architecture: Uses 2D Edge Filter, as CNNs naturally handle vertical and horizontal spatial relationships between pixels
  • MLP and Transformer Architecture: Uses 1D Edge Filter, as these architectures do not inherently handle spatial relationships

Experimental Setup

Dataset and Task Selection

The authors selected four modalities with different characteristics for experimentation:

  1. Vision Domain: Test-Time Adaptation (TTA) tasks
    • CIFAR10-C/100-C and ImageNet200-C benchmarks
    • Using WRN28-10, ResNet18, and ViT-B/32 architectures
  2. Language Domain: Sentiment analysis tasks
    • GLUE benchmark subtasks: SST-2, QQP, QNLI
    • Using 12-layer Transformer (BERT architecture)
  3. 3D Domain: Few-shot neural radiance fields
    • Blender dataset, 8-view few-shot setting
    • Evaluation metrics: PSNR, SSIM, LPIPS, MAE
  4. Audio Domain: Audio classification
    • UrbanSound8K dataset
    • CNN architecture with three convolutional blocks

Implementation Details

  • The LPF component of Edge Filter is decoupled during model training to suppress gradient backpropagation
  • Edge Filter is implemented in only a single layer in each model to avoid information loss from multiple filters
  • Reflection padding is used to maintain consistent input-output dimensions

Experimental Results

Main Results

Vision Domain (TTA)

Results on CIFAR10-C/100-C and ImageNet200-C show:

  • CIFAR10-C: Performance improvement of 1.2%p to 8.5%p
  • CIFAR100-C: Performance improvement of 0.4%p to 10.2%p
  • ImageNet200-C: Performance improvement of 0.1%p to 1.9%p

Notably, despite slight performance degradation on source datasets, significant performance improvements on corrupted datasets indicate that Edge Filter effectively prevents overfitting.

Language Domain

On GLUE benchmarks:

  • SST-2: 79.36% → 80.85% (+1.49%p)
  • QQP: 83.42% → 83.46% (+0.04%p)
  • QNLI: 62.40% → 63.30% (+0.90%p)

3D Domain

In NeRF few-shot rendering:

  • Average PSNR improvement: 22.95 → 23.39 (+0.44)
  • Average SSIM improvement: 0.856 → 0.862 (+0.006)
  • LPIPS significantly reduced by 11%, indicating notable visual quality improvement

Audio Domain

UrbanSound8K classification task: 77.42% → 81.72% (+4.3%p)

Analysis Experiments

Feature Sparsity Analysis

By measuring the density of layer outputs during training, the authors found that Edge Filter significantly reduces output density in subsequent layers, validating the theory that high-pass filtering leads to sparse feature encoding.

Frequency Domain Analysis

FFT analysis shows that Edge Filter effectively reduces amplitude in the low-frequency region of deep features, confirming its intended function as a high-pass operator.

Ablation Studies

Filter Type Comparison

Testing different LPF types (mean, median, Gaussian):

  • Mean and median filters show consistent performance improvements across all tasks
  • Direct LPF application results in significant performance degradation, validating the hypothesis that low-frequency components contain domain-specific information

Position and Kernel Size Effects

  • WRN models: Edge Filter application universally brings performance improvements, with maximum improvement of 9.6%p
  • ViT models: Applying filters in later layers is more effective
  • Language tasks: Performance remains unchanged or improves regardless of position and kernel size

Frequency Perspective in Deep Learning

Existing research primarily focuses on image data and CNNs, finding that:

  • CNNs have strong bias toward texture rather than shape
  • Deep neural networks follow a "frequency principle," learning low-frequency components first during training

Activation Filtering and Sparsity

Related work includes:

  • Filter Response Normalization (FRN)
  • Deep Frequency Filtering
  • ProSparse and other methods

The innovation of this paper lies in proposing a universal filtering layer applicable to different deep learning applications.

Conclusions and Discussion

Main Conclusions

  1. Deep Edge Filter effectively extracts more generalizable features, showing consistent performance improvements across multiple modalities and architectures
  2. The theoretical hypothesis is empirically validated: semantic information primarily exists in high-frequency components, while domain-specific information exists in low-frequency components
  3. The method is architecture-agnostic and modality-agnostic

Limitations

  1. Computational Cost: Requires retraining models from scratch, limiting extensive experiments on large models
  2. Insufficient Large Model Validation: Due to computational constraints, validation on state-of-the-art models or broader tasks is limited
  3. Language Domain Limitations: Unable to conduct experiments on LLMs

Future Directions

  1. Apply the method to large language models (LLMs)
  2. Explore applications in multimodal models
  3. Investigate more efficient implementations to reduce retraining requirements

In-Depth Evaluation

Strengths

  1. Strong Theoretical Innovation: Successfully generalizes the concept of edge filtering from classical image processing to deep features, providing a novel theoretical perspective
  2. Comprehensive Cross-Modality Validation: Validates the method across four different modalities—vision, text, 3D, and audio—demonstrating the universality of the approach
  3. Integration of Theory and Practice: Not only proposes the method but also provides theoretical explanations through sparse coding theory and frequency analysis
  4. Rigorous Experimental Design: Includes rich ablation studies, statistical significance tests, and visualization analysis

Weaknesses

  1. Insufficient Computational Overhead Analysis: While computational overhead comparison is provided in Appendix F, the analysis of efficiency impact in practical applications lacks depth
  2. Limited Large Model Validation: Primarily validates on relatively small models; applicability to current mainstream large models remains to be verified
  3. Limited Theoretical Explanation: While providing frequency domain explanations, the deeper mechanisms of why semantic information primarily exists in high-frequency components are insufficiently explained
  4. Application Scenario Constraints: The requirement to retrain models limits direct application to pretrained models

Impact

  1. Academic Value: Provides a new perspective for feature representation learning in deep learning, potentially inspiring related research
  2. Practical Value: The method is simple to implement and has practical application value in tasks requiring improved generalization
  3. Reproducibility: Authors provide complete code implementation with sufficient experimental detail descriptions

Applicable Scenarios

  1. Domain Adaptation Tasks: Particularly suitable for scenarios requiring cross-domain generalization
  2. Few-Shot Learning: Improves model generalization when data is limited
  3. High Robustness Requirements: Application scenarios sensitive to noise and perturbations
  4. Multimodal Learning: Can be uniformly applied to feature processing across different modalities

References

The paper cites 53 related references, primarily covering:

  • Frequency analysis-related work in deep learning
  • Domain adaptation and test-time adaptation methods
  • Activation filtering and network sparsity research
  • Benchmark datasets and evaluation methods for various modalities

Overall Assessment: This is an excellent paper that balances theoretical innovation with practical validation, successfully introducing classical signal processing concepts into modern deep learning and validating their effectiveness across multiple domains. Despite some limitations, its novel perspective and consistent experimental results provide significant academic value and practical significance.