2025-11-18T10:52:13.210456

A mathematical theory for understanding when abstract representations emerge in neural networks

Wang, Johnston, Fusi

Recent experiments reveal that task-relevant variables are often encoded in approximately orthogonal subspaces of the neural activity space. These disentangled low-dimensional representations are observed in multiple brain areas and across different species, and are typically the result of a process of abstraction that supports simple forms of out-of-distribution generalization. The mechanisms by which such geometries emerge remain poorly understood, and the mechanisms that have been investigated are typically unsupervised (e.g., based on variational auto-encoders). Here, we show mathematically that abstract representations of latent variables are guaranteed to appear in the last hidden layer of feedforward nonlinear networks when they are trained on tasks that depend directly on these latent variables. These abstract representations reflect the structure of the desired outputs or the semantics of the input stimuli. To investigate the neural representations that emerge in these networks, we develop an analytical framework that maps the optimization over the network weights into a mean-field problem over the distribution of neural preactivations. Applying this framework to a finite-width ReLU network, we find that its hidden layer exhibits an abstract representation at all global minima of the task objective. We further extend these analyses to two broad families of activation functions and deep feedforward architectures, demonstrating that abstract representations naturally arise in all these scenarios. Together, these results provide an explanation for the widely observed abstract representations in both the brain and artificial neural networks, as well as a mathematically tractable toolkit for understanding the emergence of different kinds of representations in task-optimized, feature-learning network models.

academic

신경망에서 추상적 표현이 나타나는 시점을 이해하기 위한 수학 이론

기본 정보

논문 ID: 2510.09816
제목: A mathematical theory for understanding when abstract representations emerge in neural networks
저자: Bin Wang, W. Jeffrey Johnston, Stefano Fusi
소속: Center for Theoretical Neuroscience, Columbia University
분류: q-bio.NC math.OC physics.bio-ph physics.data-an stat.ML
발표 시간: 2025년 10월 14일 (프리프린트)
논문 링크: https://arxiv.org/abs/2510.09816

초록

본 논문은 신경망에서 추상적 표현(abstract representations)이 나타나는 수학적 메커니즘을 연구한다. 실험 결과에 따르면 과제 관련 변수들은 일반적으로 신경 활동 공간의 근사적으로 직교하는 부분공간에서 인코딩되어 해제된 저차원 표현을 형성한다. 이러한 기하학적 구조는 단순한 분포 외 일반화를 지원하지만, 그 발현 메커니즘은 아직 명확하지 않다. 저자들은 잠재 변수에 의존하는 과제에서 훈련된 순방향 비선형 네트워크에서 추상적 표현이 마지막 은닉층에 필연적으로 나타남을 수학적으로 증명한다. 이를 위해 저자들은 네트워크 가중치 최적화를 신경 전활성화 분포에 대한 평균장 문제로 매핑하는 분석 프레임워크를 개발했다.

연구 배경 및 동기

핵심 문제

추상적 표현의 보편성: 신경과학 실험에 따르면 여러 뇌 영역과 종에서 신경 활동이 추상적 표현을 나타내며, 과제 관련 변수들이 근사적으로 직교하는 부분공간에서 인코딩된다
메커니즘 이해 부족: 이러한 기하학적 구조가 광범위하게 존재하지만, 그 발현의 네트워크 메커니즘은 여전히 불명확하다
기존 방법의 한계: 연구된 메커니즘은 주로 비감독 방법(예: 변분 자동인코더)이지만, 식별 가능성 문제로 인해 순수 비감독 학습에서 표현 해제가 어렵다

연구의 중요성

이론적 의의: 광범위하게 관찰되는 추상적 표현 현상에 대한 수학적 설명 제공
실용적 가치: 표현 학습 메커니즘의 이해는 더 나은 신경망 아키텍처 설계에 도움
학제간 영향: 신경과학과 기계학습의 표현 학습 이론을 연결

핵심 기여

이론적 보장: 다중 과제 감독 학습 설정에서 순방향 비선형 네트워크가 필연적으로 추상적 표현을 생성함을 처음으로 수학적으로 증명
분석 프레임워크: 네트워크 가중치 최적화를 신경 전활성화 분포 평균장 문제로 매핑하는 범용 분석 도구 개발
활성화 함수 견고성: 추상적 표현의 출현이 활성화 함수 선택에 견고함을 증명
아키텍처 확장: 분석을 심층 네트워크 및 순환 네트워크로 확장
신경과학적 통찰: 생물학적 신경망에서 관찰되는 추상적 표현에 대한 계산 설명 제공

방법론 상세 설명

과제 정의

훈련 데이터셋 $D = \{(x^i, y^i)\}_{i=1}^P$ 를 고려하면:

입력 $x^i \in \mathbb{R}^{d_X}$ 는 기본적으로 구조화되지 않음
출력 $y^i \in \{±1\}^{d_Y}$ 는 $d_Y$ 개의 이진 레이블을 포함하며 잠재 변수 구조를 반영
모든 데이터는 $2^{d_Y}$ 개의 서로 다른 클래스를 형성하며, 각 클래스는 $n$ 개의 샘플 포함
총 샘플 수 $P = n \cdot 2^{d_Y}$

네트워크 아키텍처

가장 단순한 2층 네트워크를 연구: $f_{W_1,W_2,b}(x) = W_2\phi(W_1x + b)$

여기서:

$W_1 \in \mathbb{R}^{M \times d_X}$ : 첫 번째 층 가중치 행렬
$W_2 \in \mathbb{R}^{d_Y \times M}$ : 두 번째 층 가중치 행렬
$b \in \mathbb{R}^M$ : 편향 매개변수
$\phi$ : 원소별 비선형 활성화 함수
$M$ : 은닉층 너비

손실 함수

L2 정규화를 포함한 평균제곱오차 사용: $E(W_1,W_2,b) = \|Y - W_2\phi(WX)\|_F^2 + \lambda_1\|W\|_F^2 + \lambda_2\|W_2\|_F^2$

추상적 표현 측정

**평행성 점수(Parallelism Score, PS)**를 사용하여 표현의 추상화 정도 정량화:

클래스 원형 표현: $r^{(y)} = \frac{1}{n}\sum_{i:y^i=y} r^i$
표현 변화 방향: $\Delta r^{(k;\alpha)} = r^{(y_k=+1,y_{\setminus k}=\alpha)} - r^{(y_k=-1,y_{\setminus k}=\alpha)}$
평행성 점수: $PS = \frac{1}{d_Y}\sum_{k=1}^{d_Y} PS_k$

여기서 $PS_k$ 는 $k$ 번째 잠재 레이블 인코딩 방향의 일관성을 측정한다. PS = 1은 완전한 추상적 표현에 해당한다.

분석 프레임워크 핵심

평균장 변환

핵심 혁신은 원래 최적화 문제: $\min_{W_1,W_2,b} E(W_1,W_2,b)$

를 신경 전활성화 분포에 대한 최적화로 변환: $\min_{\rho_M} \mathcal{E}[\rho_M]$

여기서 $\rho_M = \sum_{k=1}^M \delta_{h_k}$ 는 전활성화 패턴의 경험적 측도이다.

유효 에너지 함수

유효 시스템의 에너지 함수: $\mathcal{E}[\rho_M] = \lambda_1\int h^T K_X^\dagger h d\rho_M(h) + \text{tr}\left(\frac{\lambda_2}{\lambda_2 + \int\phi(h)\phi(h)^T d\rho_M(h)} K_Y\right)$