Complexity Measures in Machine Learning
Authors
Weigutian OuReference
2025.[BibTeX, LaTeX, and HTML Reference]
Abstract
This thesis explores complexity measures in machine learning in two distinct aspects, namely, the complexity of models and problems. Specifically, we study the covering numbers for deep ReLU networks, deriving tight lower and upper bounds on the covering numbers, and investigate the asymptotic behavior of the minimax risk in denoising a signal from a general set, introducing a new dimensional measure called the minimax risk dimension. The first part of the thesis is motivated by the following gap in the literature: while upper bounds on covering numbers of families of deep ReLU networks have been used to characterize their approximation-theoretic performance, upper-bound the prediction error they incur in nonparametric regression, corresponding lower bounds on covering numbers do not seem to be available in the literature. This thesis addresses this gap by deriving tight lower and upper bounds on the covering numbers of fully-connected networks with bounded weights, sparse networks with bounded weights, and fully-connected networks with quantized weights. Thanks to the tightness of the bounds, a fundamental understanding of the impact of sparsity, quantization, bounded vs. unbounded weights, and network output truncation can be developed. Furthermore, the bounds allow us to characterize the fundamental limits of neural network transformation, including network compression, and lead to sharp upper bounds on the prediction error in nonparametric regression through deep networks. Specifically, we can remove a log^6(n)-factor in the best-known sample complexity rate in the estimation of Lipschitz functions through deep networks, thereby establishing optimality. Finally, we identify a systematic relation between optimal nonparametric regression and optimal approximation through deep networks, unifying numerous results in the literature and uncovering general underlying principles. The second part of the thesis investigates the minimax risk in recovering a signal x ∈ S ⊆ R^n from its noisy observation x + σN_n when σ goes to 0. The asymptotic behavior of the minimax risk, scaled by 1/σ^2, defines the minimax risk dimension. The minimax risk dimension exhibits monotonicity, and invariance under closure, isometric mappings, and isotropic scaling, though it lacks stability under unions. Additionally, we show that the minimax risk dimension is lower-bounded by the Hausdorff dimension. This lower bound provides a straightforward improvement over a minimax risk lower bound in denoising low rank matrices, and uncovers oscillatory behavior in the minimax risk for denoising signals from Cantor sets.This publication is currently not available for download.