2025-11-13T11:46:11.189224

Representation Theorem for Matrix Product States

Guo, Draper
In this work, we investigate the universal representation capacity of the Matrix Product States (MPS) from the perspective of boolean functions and continuous functions. We show that MPS can accurately realize arbitrary boolean functions by providing a construction method of the corresponding MPS structure for an arbitrarily given boolean gate. Moreover, we prove that the function space of MPS with the scale-invariant sigmoidal activation is dense in the space of continuous functions defined on a compact subspace of the $n$-dimensional real coordinate space $\mathbb{R^{n}}$. We study the relation between MPS and neural networks and show that the MPS with a scale-invariant sigmoidal function is equivalent to a one-hidden-layer neural network equipped with a kernel function. We construct the equivalent neural networks for several specific MPS models and show that non-linear kernels such as the polynomial kernel which introduces the couplings between different components of the input into the model appear naturally in the equivalent neural networks. At last, we discuss the realization of the Gaussian Process (GP) with infinitely wide MPS by studying their equivalent neural networks.
academic

Representation Theorem for Matrix Product States

Basic Information

  • Paper ID: 2103.08277
  • Title: Representation Theorem for Matrix Product States
  • Authors: Erdong Guo, David Draper (University of California, Santa Cruz)
  • Classification: stat.ML cs.LG cs.NE quant-ph
  • Publication Date: March 15, 2021 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2103.08277

Abstract

This paper investigates the universal representation capacity of Matrix Product States (MPS) from the perspectives of Boolean functions and continuous functions. The authors prove that MPS can exactly implement arbitrary Boolean functions and provide constructive methods for building corresponding MPS structures for any given Boolean gate. Furthermore, they demonstrate that the function space of MPS with scale-invariant sigmoid activation functions is dense in the space of continuous functions defined on compact subsets of n-dimensional real coordinate space. The relationship between MPS and neural networks is studied, showing that MPS with scale-invariant sigmoid functions is equivalent to single-hidden-layer neural networks equipped with kernel functions. Finally, the realization of Gaussian processes (GP) by infinite-width MPS is discussed through the study of equivalent neural networks.

Research Background and Motivation

Problem Background

  1. Rise of Tensor Networks: Tensor networks serve as a powerful graphical language for representing many-body quantum systems and have found widespread applications across quantum information, condensed matter physics, applied mathematics, and computer science.
  2. Representation Capacity of MPS: While MPS possesses important physical significance in quantum physics, a natural question arises when treating it as an algebraic tool in machine learning: how powerful is MPS as an algebraic machine in terms of representation capacity?
  3. Need for Universal Approximation Theory: Similar to the universal approximation theorem for neural networks, theoretical proof of MPS's representational boundaries is needed.

Research Motivation

  1. Filling Theoretical Gaps: Existing research primarily focuses on the physical properties of MPS, lacking theoretical analysis of its role as a function approximator.
  2. Establishing Connections Between MPS and Neural Networks: Exploring the equivalence relationships between MPS and classical machine learning models, particularly neural networks.
  3. Practical Considerations: In practical applications, complete bases are typically infinite-dimensional, necessitating investigation of how large a function space MPS can "span" under mild assumptions.

Core Contributions

  1. Exact Representation of Boolean Functions: Proves that MPS can exactly implement arbitrary Boolean functions and provides constructive proof methods.
  2. Universal Approximation of Continuous Functions: Demonstrates that the function space of MPS with scale-invariant sigmoid activation is dense in the continuous function space (with respect to the supremum norm).
  3. Equivalence Between MPS and Neural Networks: Establishes the equivalence relationship between MPS and single-hidden-layer neural networks, revealing the natural emergence of kernel functions in MPS.
  4. Realization of Gaussian Processes: Discusses the implementation of Gaussian processes through infinite-width MPS.

Detailed Methods

MPS Model Definition

Standard MPS Structure

The original MPS model is defined as: Ψl(xw,B)={α,s}Aα1α2s1Alαiαi+1siAαnα1snΦs1sn(x)\Psi_l(x|w,B) = \sum_{\{\alpha,s\}} A^{s_1}_{\alpha_1\alpha_2} \cdots A^{s_i}_{l\alpha_i\alpha_{i+1}} \cdots A^{s_n}_{\alpha_n\alpha_1} \Phi^{s_1\cdots s_n}(x)

where the kernel function is defined as: Φs1sn(x)=ϕs1(x1)ϕsi(xi)ϕsn(xn)\Phi^{s_1\cdots s_n}(x) = \phi^{s_1}(x_1) \otimes \cdots \otimes \phi^{s_i}(x_i) \cdots \otimes \phi^{s_n}(x_n)

Modified MPS Structure

To achieve universal approximation, the authors propose a modified MPS structure: Ψ(xw,B)=lσ({α,s}Aα1α2s1Alαiαi+1siAαnα1snΦs1sn(x))\Psi(x|w,B) = \sum_l \sigma\left(\sum_{\{\alpha,s\}} A^{s_1}_{\alpha_1\alpha_2} \cdots A^{s_i}_{l\alpha_i\alpha_{i+1}} \cdots A^{s_n}_{\alpha_n\alpha_1} \Phi^{s_1\cdots s_n}(x)\right)

where σ()\sigma(\cdot) is a scale-invariant sigmoid function: σ(x){0xCx+\sigma(x) \to \begin{cases} 0 & x \to -\infty \\ C & x \to +\infty \end{cases}

Boolean Function Representation Methods

Implementation of Basic Boolean Gates

AND Gate Implementation (Theorem 2.1):

  • Kernel function: ϕi(Xi)=[Xi,1Xi]\phi_i(X_i) = [X_i, 1-X_i]
  • Tensor node: Asi=[1,0]A^{s_i} = [1, 0], bond dimension α=1|\alpha| = 1

OR Gate Implementation (Theorem 2.2):

  • Kernel function: ϕi(Xi)=[Xi,1Xi]\phi_i(X_i) = [X_i, 1-X_i]
  • Tensor node bond dimension: α=3|\alpha| = 3
  • Specific tensor structure: Aα1α2s1=[[1,0,1],[0,1,0]]A^{s_1}_{\alpha_1\alpha_2} = [[1, 0, 1], [0, 1, 0]]Aα2α1s2=[[0,1,1],[1,0,0]]A^{s_2}_{\alpha_2\alpha_1} = [[0, 1, 1], [1, 0, 0]]

NOT Gate Implementation (Theorem 2.3):

  • Kernel function: ϕ1(X1)=1X1\phi_1(X_1) = 1-X_1
  • Tensor node: As1=1A^{s_1} = 1

Universal AND Gate and Arbitrary Boolean Functions

Universal AND Gate (Theorem 2.4): For nn input variables, the following can be implemented: Ψ(X1,,Xn)=(i=1lXi)(j=l+1nXj)\Psi(X_1, \cdots, X_n) = (\bigwedge_{i=1}^l X_i) \bigwedge (\bigwedge_{j=l+1}^n \overline{X_j})

Arbitrary Boolean Functions (Theorem 2.5): By expressing arbitrary Boolean functions as disjunctive normal forms corresponding to truth tables, the corresponding MPS can be constructed. Construction rules:

  1. Write the Boolean function in disjunctive normal form from its truth table
  2. Set the bond dimension equal to the number of disjunctive terms mm
  3. Fill tensor elements according to specific rules

Continuous Function Approximation Theory

Main Theorem (Theorem 3.1)

The MPS function space is dense in C0(In)C_0(I^n) (the space of continuous functions on the unit hypercube), meaning for any f(x)C0(In)f(x) \in C_0(I^n) and any ε>0\varepsilon > 0, there exists an MPS function Ψ(x)\Psi(x) such that: supxΨ(x)f(x)<ε\sup_x |\Psi(x) - f(x)| < \varepsilon

Proof Techniques

Linearity Proof (Lemma 3.2): Proves that the MPS function family M\mathcal{M} is a linear subspace of C0(In)C_0(I^n):

  1. Closure under scalar multiplication: utilizing scale-invariance
  2. Closure under addition: constructing new MPS representing the sum of two MPS

Discriminating Property (Lemma 3.4): Proves that the scale-invariant sigmoid function possesses the discriminating property: if a finite signed measure μ\mu makes the integral of all MPS functions zero, then μ=0\mu = 0.

Main Theorem Proof: Uses proof by contradiction with Hahn-Banach theorem and Riesz representation theorem:

  1. Assume M\overline{\mathcal{M}} is a proper subset of C0(In)C_0(I^n)
  2. By Hahn-Banach theorem, there exists a non-trivial functional annihilating M\overline{\mathcal{M}}
  3. By Riesz representation theorem, this corresponds to a non-trivial measure
  4. By the discriminating property, this measure must be zero, yielding a contradiction

Equivalence Between MPS and Neural Networks

Equivalence Theorem (Theorem 3.5)

MPS with scale-invariant sigmoid activation is equivalent to single-hidden-layer neural networks equipped with kernel functions.

Conversion Method

By contracting internal indices {αi}\{\alpha_i\}, MPS can be written as: Ψ(x)=lσ(sWslΦs(x))\Psi(x) = \sum_l \sigma\left(\sum_s W^l_s \Phi^s(x)\right)

This is precisely the form of a single-hidden-layer neural network, where:

  • WslW^l_s are weight parameters
  • Φs(x)\Phi^s(x) are kernel functions, naturally introducing coupling between input components

Natural Emergence of Kernel Functions

Concrete examples demonstrate how nonlinear kernels such as polynomial kernels naturally emerge in equivalent neural networks, for instance: (Φs)T=[x1x2x3,x2x3,x1x3,x1x2,x1,x2,x3,1](\Phi^s)^T = [x_1x_2x_3, x_2x_3, x_1x_3, x_1x_2, x_1, x_2, x_3, 1]

Experimental Results and Case Studies

Boolean Function Implementation Cases

3-Input OR Gate: Boolean expression: f(X1,X2,X3)=X1X2X3f(X_1,X_2,X_3) = X_1 \vee X_2 \vee X_3 The corresponding MPS tensor structure is detailed in the methods section.

3-Input Parity Gate: Boolean expression: f(X1,X2,X3)=X1X2X3f(X_1,X_2,X_3) = X_1 \oplus X_2 \oplus X_3 Equivalent neural network weights: Ws=[1,0,0,1,0,1,1,0]W^s = [1, 0, 0, 1, 0, 1, 1, 0]

Threshold Gate Th₃²: A threshold function that outputs 1 when at least 2 inputs are 1.

Complexity Analysis

For nn-input Boolean gates, the bond dimension requires O(2n)O(2^n) in extreme cases, but can be reduced to O(2n1)O(2^{n-1}) through Karnaugh map simplification, with total parameters O(n2n1)O(n2^{n-1}), comparable to single-hidden-layer neural network efficiency.

Tensor Network Theory Foundations

  • Penrose (1971) graphical notation system for tensor calculus
  • Vidal (2003, 2007) Schmidt decomposition and DMRG methods
  • Perez-Garcia et al. (2006) systematic study of MPS properties

Tensor Networks in Machine Learning

  • Stoudenmire & Schwab (2016) supervised learning applications
  • Various tensor network applications in regression, classification, and density estimation

Universal Approximation Theory

  • Classical work by Cybenko (1989), Hornik (1991) on neural network universal approximation
  • This paper employs similar functional analysis techniques

Conclusions and Discussion

Main Conclusions

  1. Theoretical Completeness: MPS possesses the capacity to represent arbitrary Boolean functions and approximate arbitrary continuous functions
  2. Equivalence Revelation: MPS is essentially equivalent to single-hidden-layer neural networks with kernel functions
  3. Kernel Function Importance: The kernel function Φs1sn\Phi^{s_1\cdots s_n} is key to MPS's representational capacity, rather than the bond indices {αi}\{\alpha_i\}

Limitations

  1. Practical Concerns: MPS implementation of Boolean functions requires exponential bond dimensions
  2. Loss of Physical Meaning: As a pure algebraic tool, MPS loses important quantum physical properties such as entanglement
  3. Kernel Function Design: Careful design of kernel functions is needed to achieve sufficient representational capacity

Future Directions

  1. Efficient Construction Methods: Seeking more efficient MPS construction methods to reduce complexity
  2. Deep Structures: Exploring multi-layer MPS structures, analogous to deep neural networks
  3. Quantum Advantage: Exploring unique advantages of MPS in quantum computing environments

In-Depth Evaluation

Strengths

  1. Significant Theoretical Contribution: First systematic analysis of MPS representational capacity from a function approximation perspective
  2. Rigorous Proofs: Uses classical functional analysis tools with rigorous proof procedures
  3. Insightful Connections: Reveals deep connections between MPS and neural networks, providing bridges for cross-disciplinary understanding
  4. Constructive Proofs: Not only proves existence but also provides concrete construction methods

Weaknesses

  1. Limited Practical Applicability: Theoretical results may face the curse of dimensionality in practical applications
  2. Insufficient Experimental Validation: Lacks large-scale numerical experiments to verify theoretical results
  3. Missing Optimization Algorithms: Does not discuss how to efficiently train such MPS models
  4. Inadequate Comparative Analysis: Lacks detailed comparative analysis with other universal approximators

Impact

  1. High Theoretical Value: Provides solid theoretical foundation for tensor network applications in machine learning
  2. Cross-Disciplinary Significance: Connects quantum physics and machine learning domains
  3. Strong Inspirational Value: Provides important reference for subsequent research on tensor network representation capacity and optimization methods

Applicable Scenarios

  1. Theoretical Research: Suitable as foundational literature for tensor network representation theory
  2. Educational Purposes: Can be used to explain the relationship between MPS and neural networks
  3. Algorithm Design: Provides theoretical guidance for designing MPS-based machine learning algorithms
  4. Quantum Machine Learning: Provides theoretical support for quantum machine learning algorithm design

References

This paper cites important literature from multiple fields including tensor networks, quantum information, machine learning, and functional analysis, including:

  • Tensor network foundational theory (Penrose, 1971; Vidal, 2007; Perez-Garcia et al., 2006)
  • Neural network universal approximation theory (Cybenko, 1989; Hornik, 1991)
  • Tensor network applications in machine learning (Stoudenmire & Schwab, 2016; et al.)
  • Functional analysis theoretical foundations (Folland, 2013)