Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning
Lee, Lee, Kwak
We introduce the Deep Edge Filter, a novel approach that applies high-pass filtering to deep neural network features to improve model generalizability. Our method is motivated by our hypothesis that neural networks encode task-relevant semantic information in high-frequency components while storing domain-specific biases in low-frequency components of deep features. By subtracting low-pass filtered outputs from original features, our approach isolates generalizable representations while preserving architectural integrity. Experimental results across diverse domains such as Vision, Text, 3D, and Audio demonstrate consistent performance improvements regardless of model architecture and data modality. Analysis reveals that our method induces feature sparsification and effectively isolates high-frequency components, providing empirical validation of our core hypothesis. The code is available at https://github.com/dongkwani/DeepEdgeFilter.
academic
Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning
This paper proposes Deep Edge Filter, a novel method that applies high-pass filtering to deep neural network features to improve model generalization. The method is based on the hypothesis that neural networks encode task-relevant semantic information in high-frequency components of deep features, while storing domain-specific biases in low-frequency components. By subtracting low-pass filtered outputs from original features, the method isolates generalizable representations while maintaining architectural integrity. Experimental results across multiple domains including vision, text, 3D, and audio demonstrate consistent performance improvements regardless of model architecture and data modality. Analysis shows that the method induces feature sparsification and effectively separates high-frequency components, providing empirical validation of the core hypothesis.
A core challenge faced by deep learning models is their vulnerability to perturbations and domain shifts. The increased reliance on surface-level low-level texture features acquired during training further exacerbates their vulnerability to perturbations, particularly evident in adversarial attacks and domain adaptation.
The authors observe that traditional edge filters have long been used in image processing as classical techniques for effectively capturing relevant information, providing strong priors robust to various noise types while effectively extracting semantic information. However, this knowledge appears to have been forgotten in modern deep learning.
The primary reasons why past attempts to integrate edge detection techniques into deep learning failed include:
Applying edge filters to images, while providing robustness to perturbations, results in loss of fine-grained image details
Classical edge detection is limited to the image domain and is difficult to apply universally in modern deep learning that handles diverse data modalities
This paper generalizes the concept of edge filters to deep features, which can be applied directly to deeper layers rather than input layers, combining the advantages of traditional edge filters and deep learning to construct models robust to perturbations and domain shifts.
Proposes Deep Edge Filter: A filter constructed based on human intuition that can be applied to features of deep neural networks in a modality-agnostic manner, promoting extraction of generalizable features
Cross-Architecture and Cross-Modality Validation: Proposes Edge Filters for CNN and ViT architectures and empirically demonstrates the filter's effectiveness on generalization-critical tasks across multiple modalities including images, text, 3D, and audio
Theoretical Analysis and Empirical Validation: Analyzes experimental results from the perspectives of layer sparsity and frequency decomposition, and provides extensive ablation studies on Deep Feature Edge Filters
The authors propose a key hypothesis: deep networks encode task-relevant semantic features in high-frequency components and domain-specific biases in low-frequency components. If this hypothesis holds, generalizing Edge Filter (essentially functioning as a high-pass filter) should help isolate generalizable features.
Under the proposed feature decomposition and frequency hypothesis:
LPF(h) ≈ h_dom ⇒ F_edge(h) ≈ h_sem
This approach of refining features through frequency filtering strongly resonates with sparse coding principles. By removing low-frequency, domain-specific redundancy from h through edge filtering, the method essentially simplifies the signal that needs to be represented.
By measuring the density of layer outputs during training, the authors found that Edge Filter significantly reduces output density in subsequent layers, validating the theory that high-pass filtering leads to sparse feature encoding.
FFT analysis shows that Edge Filter effectively reduces amplitude in the low-frequency region of deep features, confirming its intended function as a high-pass operator.
Testing different LPF types (mean, median, Gaussian):
Mean and median filters show consistent performance improvements across all tasks
Direct LPF application results in significant performance degradation, validating the hypothesis that low-frequency components contain domain-specific information
Deep Edge Filter effectively extracts more generalizable features, showing consistent performance improvements across multiple modalities and architectures
The theoretical hypothesis is empirically validated: semantic information primarily exists in high-frequency components, while domain-specific information exists in low-frequency components
The method is architecture-agnostic and modality-agnostic
Strong Theoretical Innovation: Successfully generalizes the concept of edge filtering from classical image processing to deep features, providing a novel theoretical perspective
Comprehensive Cross-Modality Validation: Validates the method across four different modalities—vision, text, 3D, and audio—demonstrating the universality of the approach
Integration of Theory and Practice: Not only proposes the method but also provides theoretical explanations through sparse coding theory and frequency analysis
Rigorous Experimental Design: Includes rich ablation studies, statistical significance tests, and visualization analysis
Insufficient Computational Overhead Analysis: While computational overhead comparison is provided in Appendix F, the analysis of efficiency impact in practical applications lacks depth
Limited Large Model Validation: Primarily validates on relatively small models; applicability to current mainstream large models remains to be verified
Limited Theoretical Explanation: While providing frequency domain explanations, the deeper mechanisms of why semantic information primarily exists in high-frequency components are insufficiently explained
Application Scenario Constraints: The requirement to retrain models limits direct application to pretrained models
The paper cites 53 related references, primarily covering:
Frequency analysis-related work in deep learning
Domain adaptation and test-time adaptation methods
Activation filtering and network sparsity research
Benchmark datasets and evaluation methods for various modalities
Overall Assessment: This is an excellent paper that balances theoretical innovation with practical validation, successfully introducing classical signal processing concepts into modern deep learning and validating their effectiveness across multiple domains. Despite some limitations, its novel perspective and consistent experimental results provide significant academic value and practical significance.