Unraveling the Black Box of Neural Networks: A Dynamic Extremum Mapper
Chen
We point out that neural networks are not black boxes, and their generalization stems from the ability to dynamically map a dataset to the extrema of the model function. We further prove that the number of extrema in a neural network is positively correlated with the number of its parameters. We then propose a new algorithm that is significantly different from back-propagation algorithm, which mainly obtains the values of parameters by solving a system of linear equations. Some difficult situations, such as gradient vanishing and overfitting, can be simply explained and dealt with in this framework.
academic
Unraveling the Black Box of Neural Networks: A Dynamic Extremum Mapper
This paper contends that neural networks are not black boxes; rather, their generalization capability stems from the ability to dynamically map datasets to extremum points of model functions. The author demonstrates that the number of extremum points in neural networks is positively correlated with the number of parameters and proposes a novel algorithm that differs significantly from backpropagation, primarily obtaining parameter values through solving linear equation systems. Within this framework, difficult cases such as vanishing gradients and overfitting can be explained and addressed simply.
Although neural network-based artificial intelligence models have achieved prediction accuracy surpassing traditional machine learning algorithms in domains such as image recognition and natural language processing, research into their underlying principles remains limited, and they are still widely regarded as black boxes.
Safety Requirements: In domains such as autonomous driving that demand high real-time performance and safety, understanding how neural networks operate is essential
Fault Diagnosis: When models malfunction, it is impossible to quickly identify the root cause and resolve it immediately
Theoretical Completeness: A mathematical explanation of neural network mechanisms is needed, rather than relying solely on engineering approaches
Interpreter Methods: Primarily explain neural networks by analyzing input-output connections, but significant work remains
Information Bottleneck Theory: While providing useful references, it lacks concrete methods for parameter solving
Universal Approximation Theorem: Although Cybenko and Hornik et al. proved that feedforward neural networks can approximate arbitrary continuous functions, they did not provide methods for finding specific functions
Ideal Machine Learning Model Characteristics: Proposes the main characteristics of ideal machine learning models and provides universal model training procedures based on these characteristics
Extremum Mapping Theory: Mathematically proves that neural networks achieve generalization by mapping datasets to local extrema of functions, proposing the Extremum Increment (EI) algorithm
Problem Explanation Framework: Based on the EI algorithm, can relatively easily identify the causes of common problems such as vanishing/exploding gradients and overfitting, and provide corresponding solutions
The author first defines the characteristics of ideal models: for a dataset D = {(x^(i), y^(i))|i ∈ 1, 3}, the goal is to find a function F such that y^(i) = F(x^(i)). When samples of the same type exist, the function curve must change shape to accommodate new samples, thereby forming multiple local extremum points.
When function parameters are limited, the degree of curve shape variation is constrained, and the number of extrema cannot increase arbitrarily. The solution is to extend the essence from a single point to an interval, concentrating samples with slightly different surfaces but identical essence within that interval.
Converting N-classification function F into N binary classification functions {F_j|j ∈ 1,N}, where the j-th binary classification function F_j only determines whether the input sample belongs to the j-th class essence:
The author decomposes the neural network into a set of ln composite functions {h_v^n|v ∈ 1,ln}, where each composite function is essentially a binary classification problem.
The main steps of the EI algorithm differ significantly from the BP algorithm:
The BP algorithm uses gradient updates to approximate ideal parameter values; the EI algorithm directly obtains parameter values by solving equation systems
The BP algorithm requires updating all parameters each iteration; the EI algorithm only updates partial parameters
Reduce computational complexity by relaxing termination conditions and introducing the concept of surface neighborhoods:
Use weakened termination conditions, only requiring that the classification function value of samples be significantly larger than other classification function values
Utilize surface neighborhoods, applying strict conditions only to representative samples
Vanishing Gradients: Within the EI algorithm framework, if a particular solution can be found from the general solution W^u:n, parameters in earlier hidden layers can maintain their initial values, making vanishing gradients an inevitable consequence
Exploding Gradients: Corresponds to cases where the equation system has no solution; the solution is to increase the number of hidden layers or parameters per layer
The concept of surface neighborhoods explains how noisy samples may deviate significantly from the original sample neighborhood, causing neural networks to handle them incorrectly.
The number of samples that a neural network can fit exactly is primarily positively correlated with the total number of network parameters, with no necessary relationship to network depth. An "oblique trapezoid" network structure is recommended.
Theoretical Innovation: Reveals the essence of neural network generalization capability from a mathematical perspective, supplementing the universal approximation theorem
Unified Problem Explanation: Explains multiple classical problems such as vanishing gradients and overfitting within a unified framework
Algorithm Innovation: Proposes the EI algorithm, which differs significantly from BP algorithm, providing new perspectives for neural network training
Mathematical Rigor: Based on rigorous mathematical derivations, transforms neural network problems into homogeneous linear equation system solving
This paper reveals the working principles of neural networks from a mathematical perspective and proposes the EI algorithm framework based on extremum mapping. Although further refinement is needed for practical applications (particularly the polarization algorithm), it makes important contributions to theoretical understanding and interpretability research of neural networks. This work is expected to become an important bridge connecting the black box nature of neural networks with mathematical interpretability.