In this work, we investigate the universal representation capacity of the Matrix Product States (MPS) from the perspective of boolean functions and continuous functions. We show that MPS can accurately realize arbitrary boolean functions by providing a construction method of the corresponding MPS structure for an arbitrarily given boolean gate. Moreover, we prove that the function space of MPS with the scale-invariant sigmoidal activation is dense in the space of continuous functions defined on a compact subspace of the $n$-dimensional real coordinate space $\mathbb{R^{n}}$. We study the relation between MPS and neural networks and show that the MPS with a scale-invariant sigmoidal function is equivalent to a one-hidden-layer neural network equipped with a kernel function. We construct the equivalent neural networks for several specific MPS models and show that non-linear kernels such as the polynomial kernel which introduces the couplings between different components of the input into the model appear naturally in the equivalent neural networks. At last, we discuss the realization of the Gaussian Process (GP) with infinitely wide MPS by studying their equivalent neural networks.
- Paper ID: 2103.08277
- Title: Representation Theorem for Matrix Product States
- Authors: Erdong Guo, David Draper (University of California, Santa Cruz)
- Classification: stat.ML cs.LG cs.NE quant-ph
- Publication Date: March 15, 2021 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2103.08277
This paper investigates the universal representation capacity of Matrix Product States (MPS) from the perspectives of Boolean functions and continuous functions. The authors prove that MPS can exactly implement arbitrary Boolean functions and provide constructive methods for building corresponding MPS structures for any given Boolean gate. Furthermore, they demonstrate that the function space of MPS with scale-invariant sigmoid activation functions is dense in the space of continuous functions defined on compact subsets of n-dimensional real coordinate space. The relationship between MPS and neural networks is studied, showing that MPS with scale-invariant sigmoid functions is equivalent to single-hidden-layer neural networks equipped with kernel functions. Finally, the realization of Gaussian processes (GP) by infinite-width MPS is discussed through the study of equivalent neural networks.
- Rise of Tensor Networks: Tensor networks serve as a powerful graphical language for representing many-body quantum systems and have found widespread applications across quantum information, condensed matter physics, applied mathematics, and computer science.
- Representation Capacity of MPS: While MPS possesses important physical significance in quantum physics, a natural question arises when treating it as an algebraic tool in machine learning: how powerful is MPS as an algebraic machine in terms of representation capacity?
- Need for Universal Approximation Theory: Similar to the universal approximation theorem for neural networks, theoretical proof of MPS's representational boundaries is needed.
- Filling Theoretical Gaps: Existing research primarily focuses on the physical properties of MPS, lacking theoretical analysis of its role as a function approximator.
- Establishing Connections Between MPS and Neural Networks: Exploring the equivalence relationships between MPS and classical machine learning models, particularly neural networks.
- Practical Considerations: In practical applications, complete bases are typically infinite-dimensional, necessitating investigation of how large a function space MPS can "span" under mild assumptions.
- Exact Representation of Boolean Functions: Proves that MPS can exactly implement arbitrary Boolean functions and provides constructive proof methods.
- Universal Approximation of Continuous Functions: Demonstrates that the function space of MPS with scale-invariant sigmoid activation is dense in the continuous function space (with respect to the supremum norm).
- Equivalence Between MPS and Neural Networks: Establishes the equivalence relationship between MPS and single-hidden-layer neural networks, revealing the natural emergence of kernel functions in MPS.
- Realization of Gaussian Processes: Discusses the implementation of Gaussian processes through infinite-width MPS.
The original MPS model is defined as:
Ψl(x∣w,B)=∑{α,s}Aα1α2s1⋯Alαiαi+1si⋯Aαnα1snΦs1⋯sn(x)
where the kernel function is defined as:
Φs1⋯sn(x)=ϕs1(x1)⊗⋯⊗ϕsi(xi)⋯⊗ϕsn(xn)
To achieve universal approximation, the authors propose a modified MPS structure:
Ψ(x∣w,B)=∑lσ(∑{α,s}Aα1α2s1⋯Alαiαi+1si⋯Aαnα1snΦs1⋯sn(x))
where σ(⋅) is a scale-invariant sigmoid function:
σ(x)→{0Cx→−∞x→+∞
AND Gate Implementation (Theorem 2.1):
- Kernel function: ϕi(Xi)=[Xi,1−Xi]
- Tensor node: Asi=[1,0], bond dimension ∣α∣=1
OR Gate Implementation (Theorem 2.2):
- Kernel function: ϕi(Xi)=[Xi,1−Xi]
- Tensor node bond dimension: ∣α∣=3
- Specific tensor structure:
Aα1α2s1=[[1,0,1],[0,1,0]]Aα2α1s2=[[0,1,1],[1,0,0]]
NOT Gate Implementation (Theorem 2.3):
- Kernel function: ϕ1(X1)=1−X1
- Tensor node: As1=1
Universal AND Gate (Theorem 2.4):
For n input variables, the following can be implemented:
Ψ(X1,⋯,Xn)=(⋀i=1lXi)⋀(⋀j=l+1nXj)
Arbitrary Boolean Functions (Theorem 2.5):
By expressing arbitrary Boolean functions as disjunctive normal forms corresponding to truth tables, the corresponding MPS can be constructed. Construction rules:
- Write the Boolean function in disjunctive normal form from its truth table
- Set the bond dimension equal to the number of disjunctive terms m
- Fill tensor elements according to specific rules
The MPS function space is dense in C0(In) (the space of continuous functions on the unit hypercube), meaning for any f(x)∈C0(In) and any ε>0, there exists an MPS function Ψ(x) such that:
supx∣Ψ(x)−f(x)∣<ε
Linearity Proof (Lemma 3.2):
Proves that the MPS function family M is a linear subspace of C0(In):
- Closure under scalar multiplication: utilizing scale-invariance
- Closure under addition: constructing new MPS representing the sum of two MPS
Discriminating Property (Lemma 3.4):
Proves that the scale-invariant sigmoid function possesses the discriminating property: if a finite signed measure μ makes the integral of all MPS functions zero, then μ=0.
Main Theorem Proof:
Uses proof by contradiction with Hahn-Banach theorem and Riesz representation theorem:
- Assume M is a proper subset of C0(In)
- By Hahn-Banach theorem, there exists a non-trivial functional annihilating M
- By Riesz representation theorem, this corresponds to a non-trivial measure
- By the discriminating property, this measure must be zero, yielding a contradiction
MPS with scale-invariant sigmoid activation is equivalent to single-hidden-layer neural networks equipped with kernel functions.
By contracting internal indices {αi}, MPS can be written as:
Ψ(x)=∑lσ(∑sWslΦs(x))
This is precisely the form of a single-hidden-layer neural network, where:
- Wsl are weight parameters
- Φs(x) are kernel functions, naturally introducing coupling between input components
Concrete examples demonstrate how nonlinear kernels such as polynomial kernels naturally emerge in equivalent neural networks, for instance:
(Φs)T=[x1x2x3,x2x3,x1x3,x1x2,x1,x2,x3,1]
3-Input OR Gate:
Boolean expression: f(X1,X2,X3)=X1∨X2∨X3
The corresponding MPS tensor structure is detailed in the methods section.
3-Input Parity Gate:
Boolean expression: f(X1,X2,X3)=X1⊕X2⊕X3
Equivalent neural network weights: Ws=[1,0,0,1,0,1,1,0]
Threshold Gate Th₃²:
A threshold function that outputs 1 when at least 2 inputs are 1.
For n-input Boolean gates, the bond dimension requires O(2n) in extreme cases, but can be reduced to O(2n−1) through Karnaugh map simplification, with total parameters O(n2n−1), comparable to single-hidden-layer neural network efficiency.
- Penrose (1971) graphical notation system for tensor calculus
- Vidal (2003, 2007) Schmidt decomposition and DMRG methods
- Perez-Garcia et al. (2006) systematic study of MPS properties
- Stoudenmire & Schwab (2016) supervised learning applications
- Various tensor network applications in regression, classification, and density estimation
- Classical work by Cybenko (1989), Hornik (1991) on neural network universal approximation
- This paper employs similar functional analysis techniques
- Theoretical Completeness: MPS possesses the capacity to represent arbitrary Boolean functions and approximate arbitrary continuous functions
- Equivalence Revelation: MPS is essentially equivalent to single-hidden-layer neural networks with kernel functions
- Kernel Function Importance: The kernel function Φs1⋯sn is key to MPS's representational capacity, rather than the bond indices {αi}
- Practical Concerns: MPS implementation of Boolean functions requires exponential bond dimensions
- Loss of Physical Meaning: As a pure algebraic tool, MPS loses important quantum physical properties such as entanglement
- Kernel Function Design: Careful design of kernel functions is needed to achieve sufficient representational capacity
- Efficient Construction Methods: Seeking more efficient MPS construction methods to reduce complexity
- Deep Structures: Exploring multi-layer MPS structures, analogous to deep neural networks
- Quantum Advantage: Exploring unique advantages of MPS in quantum computing environments
- Significant Theoretical Contribution: First systematic analysis of MPS representational capacity from a function approximation perspective
- Rigorous Proofs: Uses classical functional analysis tools with rigorous proof procedures
- Insightful Connections: Reveals deep connections between MPS and neural networks, providing bridges for cross-disciplinary understanding
- Constructive Proofs: Not only proves existence but also provides concrete construction methods
- Limited Practical Applicability: Theoretical results may face the curse of dimensionality in practical applications
- Insufficient Experimental Validation: Lacks large-scale numerical experiments to verify theoretical results
- Missing Optimization Algorithms: Does not discuss how to efficiently train such MPS models
- Inadequate Comparative Analysis: Lacks detailed comparative analysis with other universal approximators
- High Theoretical Value: Provides solid theoretical foundation for tensor network applications in machine learning
- Cross-Disciplinary Significance: Connects quantum physics and machine learning domains
- Strong Inspirational Value: Provides important reference for subsequent research on tensor network representation capacity and optimization methods
- Theoretical Research: Suitable as foundational literature for tensor network representation theory
- Educational Purposes: Can be used to explain the relationship between MPS and neural networks
- Algorithm Design: Provides theoretical guidance for designing MPS-based machine learning algorithms
- Quantum Machine Learning: Provides theoretical support for quantum machine learning algorithm design
This paper cites important literature from multiple fields including tensor networks, quantum information, machine learning, and functional analysis, including:
- Tensor network foundational theory (Penrose, 1971; Vidal, 2007; Perez-Garcia et al., 2006)
- Neural network universal approximation theory (Cybenko, 1989; Hornik, 1991)
- Tensor network applications in machine learning (Stoudenmire & Schwab, 2016; et al.)
- Functional analysis theoretical foundations (Folland, 2013)