Deep convolutional neural networks from a harmonic analysis perspective

Authors

Thomas Wiatowski

Reference

Mathematics of Deep Learning, Workshop organized by the Weierstrass Institute for Applied Analysis and Stochastics, Berlin, Germany, Sept. 2017, (invited talk).

[BibTeX, LaTeX, and HTML Reference]

Abstract

Deep convolutional neural networks (CNNs) used in practice employ potentially hundreds of layers and 10,000s of nodes. Such network sizes entail significant computational complexity due to the large number of convolutions that need to be carried out; in addition, a large number of parameters needs to be learned and stored. Very deep and wide CNNs may therefore not be well suited to applications operating under severe resource constraints as is the case, e.g., in low-power embedded and mobile platforms. In this talk, we aim at understanding the impact of CNN depth on the network's feature extraction capabilities. Specifically, we analyze how many layers are actually needed to have ``most'' of the input signal's features be contained in the feature vector generated by the network. This question can be formalized by asking how quickly the energy contained in the propagated signals (a.k.a. feature maps) decays across layers. We address this question for the class of scattering networks that employ general filters, the modulus non-linearity, and no pooling, and find that under mild analyticity and high-pass conditions on the filters (which encompass, inter alia, various constructions of Weyl-Heisenberg filters, wavelets, ridgelets, ($\alpha$)-curvelets, and shearlets) the feature map energy decays at least polynomially fast. For broad families of wavelets and Weyl-Heisenberg filters, the guaranteed decay rate is shown to be exponential in the network depth. Our results yield handy estimates of the number of layers needed to have at least $((1-\varepsilon)\cdot 100)\%$ of the input signal energy be contained in the feature vector. Finally, we show how networks of fixed (possibly small) depth can be designed to guarantee that most of the input signal's energy are contained in the feature vector.

This publication is currently not available for download.