Metric entropy limits on recurrent neural network learning of linear dynamical systems


Clemens Hutter, Recep Gül, and Helmut Bölcskei


Applied and Computational Harmonic Analysis, Vol. 59, pp. 198-223, July 2022, (invited paper).

[BibTeX, LaTeX, and HTML Reference]


One of the most influential results in neural network theory is the universal approximation theorem [1, 2, 3] which states that continuous functions can be approximated to within arbitrary accuracy by single-hidden-layer feedforward neural networks. The purpose of this paper is to establish a result in this spirit for the approximation of general discrete-time linear dynamical systems—including time-varying systems—by recurrent neural networks (RNNs). For the subclass of linear time-invariant (LTI) systems, we devise a quantitative version of this statement. Specifically, measuring the complexity of the considered class of LTI systems through metric entropy according to [4], we show that RNNs can optimally learn—or identify in system-theory parlance—stable LTI systems. For LTI systems whose input-output relation is characterized through a difference equation, this means that RNNs can learn the difference equation from input-output traces in a metric-entropy optimal manner.


Recurrent neural networks, linear dynamical systems, metric entropy, Hardy spaces, universal approximation, system identification


In Definition 1.1, \(\mathcal{R}_{\Phi}: \ell_{\infty} \rightarrow \ell_{\infty}\) can be replaced by the more general \(\mathcal{R}_{\Phi}: \ell_{\infty}(\mathbb{N}_{0}) \rightarrow \mathbb{R}^{\mathbb{N}_0}\).

The networks constructed in the proofs of Lemma 2.2 and Theorem 2.3 are applicable to input signals \(x\) with \(\lVert{x}\rVert_{\ell_\infty} \leq C\), where \(C \in\mathbb{R}^+\). Therefore, the sentence above equation (25) To this end, we first recall that RNNs according to Definition 1.1 accept input signals in \(\ell_\infty(\mathbb{N}_0)\) and set \(C={\lVert x \rVert}_{\ell_\infty}\). should be replaced by To this end, we first recall that RNNs according to Definition 1.1 accept input signals in \(\ell_\infty\) and choose a \(C \in \mathbb{R}^+\) such that \({\lVert x \rVert}_{\ell_\infty} \leq C\). Similarly, \(C = {\lVert x \rVert}_{\ell_\infty}\) below equation (37) should be replaced by \(C \in \mathbb{R}^+\) is such that \({\lVert x \rVert}_{\ell_\infty} \leq C\).

The constant \(\frac{1}{b}\) in the scaling result \(\mathcal{E}(\epsilon; \mathcal{C}(a, b), \rho) \thicksim \frac{1}{b}\left(\log\left(\frac{a}{\epsilon}\right)\right)^2\) specified in Theorem 3.2 is incorrect and should be replaced by \(\frac{\gamma}{2b}\), i.e., \(\mathcal{E}(\epsilon; \mathcal{C}(a, b), \rho) \thicksim \frac{\gamma}{2b}\left(\log\left(\frac{a}{\epsilon}\right)\right)^2\), where \(\gamma := \log_2(e)\). To clarify matters, we provide a self-contained proof of this scaling result in arXiv:2211.15466. Similarly, the number of required bits specified as \(\frac{1}{b}\left(\log\left(\frac{a}{\epsilon}\right)\right)^2 + o\left(\left(\log\left(\frac{1}{\epsilon}\right)\right)^2\right)\) in Theorem 4.1 should be replaced by \(\frac{\gamma}{2b}\left(\log\left(\frac{a}{\epsilon}\right)\right)^2 + o\left(\left(\log\left(\frac{1}{\epsilon}\right)\right)^2\right)\).

Download this document:


Copyright Notice: © 2022 C. Hutter, R. Gül, and H. Bölcskei.

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.