2025-11-18T10:52:13.210456

A mathematical theory for understanding when abstract representations emerge in neural networks

Wang, Johnston, Fusi

Recent experiments reveal that task-relevant variables are often encoded in approximately orthogonal subspaces of the neural activity space. These disentangled low-dimensional representations are observed in multiple brain areas and across different species, and are typically the result of a process of abstraction that supports simple forms of out-of-distribution generalization. The mechanisms by which such geometries emerge remain poorly understood, and the mechanisms that have been investigated are typically unsupervised (e.g., based on variational auto-encoders). Here, we show mathematically that abstract representations of latent variables are guaranteed to appear in the last hidden layer of feedforward nonlinear networks when they are trained on tasks that depend directly on these latent variables. These abstract representations reflect the structure of the desired outputs or the semantics of the input stimuli. To investigate the neural representations that emerge in these networks, we develop an analytical framework that maps the optimization over the network weights into a mean-field problem over the distribution of neural preactivations. Applying this framework to a finite-width ReLU network, we find that its hidden layer exhibits an abstract representation at all global minima of the task objective. We further extend these analyses to two broad families of activation functions and deep feedforward architectures, demonstrating that abstract representations naturally arise in all these scenarios. Together, these results provide an explanation for the widely observed abstract representations in both the brain and artificial neural networks, as well as a mathematically tractable toolkit for understanding the emergence of different kinds of representations in task-optimized, feature-learning network models.

academic

ニューラルネットワークにおける抽象表現の出現を理解するための数学理論

基本情報

論文ID: 2510.09816
タイトル: A mathematical theory for understanding when abstract representations emerge in neural networks
著者: Bin Wang, W. Jeffrey Johnston, Stefano Fusi
所属機関: Center for Theoretical Neuroscience, Columbia University
分類: q-bio.NC math.OC physics.bio-ph physics.data-an stat.ML
発表日: 2025年10月14日(プレプリント)
論文リンク: https://arxiv.org/abs/2510.09816

要旨

本論文は、ニューラルネットワークにおける抽象表現(abstract representations)の出現に関する数学的メカニズムを研究している。実験的知見から、タスク関連変数は通常、神経活動空間の近似直交部分空間において符号化され、分離された低次元表現を形成することが知られている。この幾何学的構造は単純な分布外汎化を支持するが、その出現メカニズムはいまだ不明である。著者らは、潜在変数に依存するタスクで訓練された前向き非線形ネットワークにおいて、抽象表現が最後の隠れ層に必然的に出現することを数学的に証明している。このため、著者らはネットワーク重み最適化を神経前活性化分布上の平均場問題にマッピングする分析フレームワークを開発した。

研究背景と動機

核心的問題

抽象表現の普遍性: 神経科学実験は、複数の脳領域および種における神経活動が抽象表現を示すことを示唆しており、タスク関連変数は近似直交部分空間において符号化されている
メカニズム理解の欠落: この幾何学的構造は広く存在するが、その出現のネットワークメカニズムは依然として不明である
既存方法の限界: 研究されたメカニズムの多くは教師なし方法(変分自己符号化器など)であるが、識別可能性の問題のため、純粋な教師なし学習による分離表現の学習は困難である

研究の重要性

理論的意義: 広く観察される抽象表現現象に対する数学的説明を提供する
実用的価値: 表現学習メカニズムの理解は、より優れたニューラルネットワークアーキテクチャの設計に役立つ
学際的影響: 神経科学と機械学習における表現学習理論を結びつける

核心的貢献

理論的保証: 多タスク教師あり学習設定において、前向き非線形ネットワークが必然的に抽象表現を生成することを初めて数学的に証明した
分析フレームワーク: ネットワーク重み最適化を神経前活性化分布平均場問題にマッピングする汎用分析ツールを開発した
活性化関数の堅牢性: 抽象表現の出現が活性化関数の選択に対して堅牢であることを証明した
アーキテクチャ拡張: 分析を深いネットワークと再帰型ネットワークに拡張した
神経科学的洞察: 生物学的ニューラルネットワークで観察される抽象表現に対する計算的説明を提供した

方法の詳細

タスク定義

訓練データセット $D = \{(x^i, y^i)\}_{i=1}^P$ を考える。ここで:

入力 $x^i \in \mathbb{R}^{d_X}$ は本質的に非構造化
出力 $y^i \in \{±1\}^{d_Y}$ は $d_Y$ 個の二値ラベルを含み、潜在変数構造を反映
すべてのデータは $2^{d_Y}$ 個の異なるクラスを形成し、各クラスは $n$ 個のサンプルを含む
総サンプル数 $P = n \cdot 2^{d_Y}$

ネットワークアーキテクチャ

最も単純な2層ネットワークを研究する: $f_{W_1,W_2,b}(x) = W_2\phi(W_1x + b)$

ここで:

$W_1 \in \mathbb{R}^{M \times d_X}$ : 第1層の重み行列
$W_2 \in \mathbb{R}^{d_Y \times M}$ : 第2層の重み行列
$b \in \mathbb{R}^M$ : バイアスパラメータ
$\phi$ : 要素ごとの非線形活性化関数
$M$ : 隠れ層の幅

損失関数

L2正則化付き平均二乗誤差を使用: $E(W_1,W_2,b) = \|Y - W_2\phi(WX)\|_F^2 + \lambda_1\|W\|_F^2 + \lambda_2\|W_2\|_F^2$

抽象表現の測定

平行性スコア(Parallelism Score, PS) を用いて表現の抽象度を定量化:

クラスプロトタイプ表現: $r^{(y)} = \frac{1}{n}\sum_{i:y^i=y} r^i$
表現変化方向: $\Delta r^{(k;\alpha)} = r^{(y_k=+1,y_{\setminus k}=\alpha)} - r^{(y_k=-1,y_{\setminus k}=\alpha)}$
平行性スコア: $PS = \frac{1}{d_Y}\sum_{k=1}^{d_Y} PS_k$

ここで $PS_k$ は第 $k$ 潜在ラベル符号化方向の一貫性を測定する。PS = 1 は完全な抽象表現に対応する。

分析フレームワークの核心

平均場変換

重要な革新は、元の最適化問題: $\min_{W_1,W_2,b} E(W_1,W_2,b)$

を神経前活性化分布上の最適化に変換することである: $\min_{\rho_M} \mathcal{E}[\rho_M]$

ここで $\rho_M = \sum_{k=1}^M \delta_{h_k}$ は前活性化パターンの経験測度である。

有効エネルギー関数

有効システムのエネルギー関数は: $\mathcal{E}[\rho_M] = \lambda_1\int h^T K_X^\dagger h d\rho_M(h) + \text{tr}\left(\frac{\lambda_2}{\lambda_2 + \int\phi(h)\phi(h)^T d\rho_M(h)} K_Y\right)$