A mathematical theory for understanding when abstract representations emerge in neural networks
Wang, Johnston, Fusi
Recent experiments reveal that task-relevant variables are often encoded in approximately orthogonal subspaces of the neural activity space. These disentangled low-dimensional representations are observed in multiple brain areas and across different species, and are typically the result of a process of abstraction that supports simple forms of out-of-distribution generalization. The mechanisms by which such geometries emerge remain poorly understood, and the mechanisms that have been investigated are typically unsupervised (e.g., based on variational auto-encoders). Here, we show mathematically that abstract representations of latent variables are guaranteed to appear in the last hidden layer of feedforward nonlinear networks when they are trained on tasks that depend directly on these latent variables. These abstract representations reflect the structure of the desired outputs or the semantics of the input stimuli. To investigate the neural representations that emerge in these networks, we develop an analytical framework that maps the optimization over the network weights into a mean-field problem over the distribution of neural preactivations. Applying this framework to a finite-width ReLU network, we find that its hidden layer exhibits an abstract representation at all global minima of the task objective. We further extend these analyses to two broad families of activation functions and deep feedforward architectures, demonstrating that abstract representations naturally arise in all these scenarios. Together, these results provide an explanation for the widely observed abstract representations in both the brain and artificial neural networks, as well as a mathematically tractable toolkit for understanding the emergence of different kinds of representations in task-optimized, feature-learning network models.
본 논문은 신경망에서 추상적 표현(abstract representations)이 나타나는 수학적 메커니즘을 연구한다. 실험 결과에 따르면 과제 관련 변수들은 일반적으로 신경 활동 공간의 근사적으로 직교하는 부분공간에서 인코딩되어 해제된 저차원 표현을 형성한다. 이러한 기하학적 구조는 단순한 분포 외 일반화를 지원하지만, 그 발현 메커니즘은 아직 명확하지 않다. 저자들은 잠재 변수에 의존하는 과제에서 훈련된 순방향 비선형 네트워크에서 추상적 표현이 마지막 은닉층에 필연적으로 나타남을 수학적으로 증명한다. 이를 위해 저자들은 네트워크 가중치 최적화를 신경 전활성화 분포에 대한 평균장 문제로 매핑하는 분석 프레임워크를 개발했다.