2025-11-23T18:13:16.980826

Unraveling the Black Box of Neural Networks: A Dynamic Extremum Mapper

Chen
We point out that neural networks are not black boxes, and their generalization stems from the ability to dynamically map a dataset to the extrema of the model function. We further prove that the number of extrema in a neural network is positively correlated with the number of its parameters. We then propose a new algorithm that is significantly different from back-propagation algorithm, which mainly obtains the values of parameters by solving a system of linear equations. Some difficult situations, such as gradient vanishing and overfitting, can be simply explained and dealt with in this framework.
academic

Unraveling the Black Box of Neural Networks: A Dynamic Extremum Mapper

Basic Information

  • Paper ID: 2507.03885
  • Title: Unraveling the Black Box of Neural Networks: A Dynamic Extremum Mapper
  • Author: Shengjian Chen (Intelligent Robotics Center, Jihua Laboratory)
  • Classification: cs.LG (Machine Learning)
  • Publication Date: arXiv Preprint (Version from October 10, 2025)
  • Paper Link: https://arxiv.org/abs/2507.03885v3

Abstract

This paper contends that neural networks are not black boxes; rather, their generalization capability stems from the ability to dynamically map datasets to extremum points of model functions. The author demonstrates that the number of extremum points in neural networks is positively correlated with the number of parameters and proposes a novel algorithm that differs significantly from backpropagation, primarily obtaining parameter values through solving linear equation systems. Within this framework, difficult cases such as vanishing gradients and overfitting can be explained and addressed simply.

Research Background and Motivation

Problem Definition

Although neural network-based artificial intelligence models have achieved prediction accuracy surpassing traditional machine learning algorithms in domains such as image recognition and natural language processing, research into their underlying principles remains limited, and they are still widely regarded as black boxes.

Significance

  1. Safety Requirements: In domains such as autonomous driving that demand high real-time performance and safety, understanding how neural networks operate is essential
  2. Fault Diagnosis: When models malfunction, it is impossible to quickly identify the root cause and resolve it immediately
  3. Theoretical Completeness: A mathematical explanation of neural network mechanisms is needed, rather than relying solely on engineering approaches

Limitations of Existing Methods

  1. Interpreter Methods: Primarily explain neural networks by analyzing input-output connections, but significant work remains
  2. Information Bottleneck Theory: While providing useful references, it lacks concrete methods for parameter solving
  3. Universal Approximation Theorem: Although Cybenko and Hornik et al. proved that feedforward neural networks can approximate arbitrary continuous functions, they did not provide methods for finding specific functions

Core Contributions

  1. Ideal Machine Learning Model Characteristics: Proposes the main characteristics of ideal machine learning models and provides universal model training procedures based on these characteristics
  2. Extremum Mapping Theory: Mathematically proves that neural networks achieve generalization by mapping datasets to local extrema of functions, proposing the Extremum Increment (EI) algorithm
  3. Problem Explanation Framework: Based on the EI algorithm, can relatively easily identify the causes of common problems such as vanishing/exploding gradients and overfitting, and provide corresponding solutions

Methodology Details

General Characteristics of Ideal Models

Exact Mapping

The author first defines the characteristics of ideal models: for a dataset D = {(x^(i), y^(i))|i ∈ 1, 3}, the goal is to find a function F such that y^(i) = F(x^(i)). When samples of the same type exist, the function curve must change shape to accommodate new samples, thereby forming multiple local extremum points.

Weakened Mapping

When function parameters are limited, the degree of curve shape variation is constrained, and the number of extrema cannot increase arbitrarily. The solution is to extend the essence from a single point to an interval, concentrating samples with slightly different surfaces but identical essence within that interval.

N-class to Binary Classification Conversion

Converting N-classification function F into N binary classification functions {F_j|j ∈ 1,N}, where the j-th binary classification function F_j only determines whether the input sample belongs to the j-th class essence:

F_j(x^(i)) = {UB, y^(i) = j
              {LB, y^(i) ≠ j

Extremum Point Analysis in Neural Networks

Model Decomposition

The author decomposes the neural network into a set of ln composite functions {h_v^n|v ∈ 1,ln}, where each composite function is essentially a binary classification problem.

Mathematical Derivation of Extremum Points

For function h_v^u, its expression is:

h_v^[u](x) = S(∑_{k=1}^{l_{u-1}} w_{v,k}^[u] * h_k^[u-1](x))

By taking partial derivatives and setting them to zero, a homogeneous linear equation system is obtained:

L(n,v) = {∑_{k=1}^{l_{n-1}} w_{v,k}^[n] * ∂h_k^[n-1](x)/∂x_t = 0 | t ∈ [1,m]}

When l_ > m, the equation system has infinitely many solutions, which is the primary reason for neural networks' strong generalization capability.

EI Algorithm Framework

Core Algorithm Concept

The main steps of the EI algorithm differ significantly from the BP algorithm:

  1. The BP algorithm uses gradient updates to approximate ideal parameter values; the EI algorithm directly obtains parameter values by solving equation systems
  2. The BP algorithm requires updating all parameters each iteration; the EI algorithm only updates partial parameters

Algorithm Procedure

  1. Initialization: Manually label the sample set, initialize parameter set W as non-zero real numbers
  2. Layer-wise Solving: Execute parameter updates layer by layer from the last hidden layer to the first hidden layer
  3. Polarization Operation: Select particular solutions satisfying termination conditions from the general solution W^u:n
  4. Parameter Update: If a particular solution is found, update parameters; otherwise, introduce additional parameters

Computational Complexity Optimization

Reduce computational complexity by relaxing termination conditions and introducing the concept of surface neighborhoods:

  • Use weakened termination conditions, only requiring that the classification function value of samples be significantly larger than other classification function values
  • Utilize surface neighborhoods, applying strict conditions only to representative samples

Theoretical Analysis and Problem Explanation

Vanishing/Exploding Gradients

  • Vanishing Gradients: Within the EI algorithm framework, if a particular solution can be found from the general solution W^u:n, parameters in earlier hidden layers can maintain their initial values, making vanishing gradients an inevitable consequence
  • Exploding Gradients: Corresponds to cases where the equation system has no solution; the solution is to increase the number of hidden layers or parameters per layer

Overfitting

Overfitting is essentially an inherent characteristic of limited extrema under finite parameters. Solutions include:

  1. Increasing the number of hidden layers or parameters per layer
  2. Enabling fixed-structure neural networks to accommodate more samples through clustering operations

Noise Effects

The concept of surface neighborhoods explains how noisy samples may deviate significantly from the original sample neighborhood, causing neural networks to handle them incorrectly.

Shallow/Deep Networks

The number of samples that a neural network can fit exactly is primarily positively correlated with the total number of network parameters, with no necessary relationship to network depth. An "oblique trapezoid" network structure is recommended.

Discussion and Limitations

Unresolved Issues

  1. Polarization Algorithm: Apart from enumeration, no efficient algorithm for finding particular solutions from general solutions has been proposed
  2. Output Layer Analysis: Complete partial differential analysis of the softmax function is needed
  3. Activation Functions: How to analyze non-differentiable functions such as ReLU
  4. Saddle Point Problem: Points where first-order partial derivatives are zero may be saddle points rather than extrema

Alternative Function Exploration

Other functions with similar dynamic variability (such as sine functions, polynomials) may possess similarly strong generalization capabilities.

In-Depth Evaluation

Strengths

  1. Theoretical Innovation: Reveals the essence of neural network generalization capability from a mathematical perspective, supplementing the universal approximation theorem
  2. Unified Problem Explanation: Explains multiple classical problems such as vanishing gradients and overfitting within a unified framework
  3. Algorithm Innovation: Proposes the EI algorithm, which differs significantly from BP algorithm, providing new perspectives for neural network training
  4. Mathematical Rigor: Based on rigorous mathematical derivations, transforms neural network problems into homogeneous linear equation system solving

Weaknesses

  1. Practical Limitations: Lacks efficient polarization algorithms, limiting practical applications of the EI algorithm
  2. Insufficient Experimental Validation: The paper is primarily theoretical analysis with insufficient experimental verification
  3. Limited Applicability: Analysis is mainly based on fully connected networks and sigmoid activation functions
  4. Computational Complexity: Although optimization schemes are proposed, computational complexity for large-scale applications requires verification

Impact

  1. Theoretical Contribution: Provides a new mathematical framework for neural network interpretability research
  2. Practical Guidance: Offers theoretical guidance for network architecture design and parameter initialization
  3. Research Direction: Opens a new research direction for studying neural networks from the perspective of extremum mapping

Applicable Scenarios

  1. Theoretical Research: Suitable for neural network interpretability and theoretical analysis research
  2. Parameter Initialization: Can serve as an initialization module for BP algorithms
  3. Network Design: Provides guidance for network architecture design with specific accuracy requirements

Conclusion

This paper reveals the working principles of neural networks from a mathematical perspective and proposes the EI algorithm framework based on extremum mapping. Although further refinement is needed for practical applications (particularly the polarization algorithm), it makes important contributions to theoretical understanding and interpretability research of neural networks. This work is expected to become an important bridge connecting the black box nature of neural networks with mathematical interpretability.

References

  • Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function
  • Hornik, K., et al. (1989). Multilayer feedforward networks are universal approximators
  • Tishby, N. & Zaslavsky, N. (2015). Deep learning and the information bottleneck principle