# Master Thesis and Semester Projects

If you are interested in one of the following topics, please contact the person listed with the respective topic.

If you don't find a compelling topic in our list, you are most welcome to discuss your interests with us.

Also, we have a list of finished theses on our website.

## List of Projects

- Upper bounds on the covering numbers of neural networks
- Learning in indefinite spaces
- Neural collapse
- Fundamental limits of deep neural network learning on low-dimensional objects
- Learning dynamical systems with recurrent neural networks
- Generalization and robustness of loss minima for deep learning
- Predicting generalization of neural networks
- Attentive recurrent neural networks for categorizing and generating news
- Acoustic sensing and trajectory estimation of objects flying at supersonic speed (with industry)
- Double descent curves in machine learning
- On the metric entropy of dynamical systems

### Upper bounds on the covering numbers of neural networks (DA)

The expressivity of neural networks refers to their ability to approximate or exactly represent certain functional families. The fundamental limit of the expressivity of a neural network family can be characterized by its covering number [1,2] and its VC dimension [3]. While tight upper bounds on the VC-dimension of neural networks with corresponding achievability results are available [4], such results in terms of covering numbers are partially missing.

The goal of this project is to fill this gap by deriving tight upper bounds
on the L^{∞}- and the L^{2}-covering numbers of neural network families with
various architectures and to then identify functional families for which neural
networks achieve these upper bounds.

Type of project: 100% theory

Prerequisites: Strong mathematical background

Supervisor: Weigutian Ou

Professor:
Helmut Bölcskei

References:

[1]
H. Bölcskei, P. Grohs, G. Kutyniok, and P. Petersen, "Optimal approximation with sparsely connected deep neural networks," *SIAM Journal on Mathematics of Data Science*, vol. 1, no. 1, pp. 8-45, 2019.
[Link to Document]

[2]
D. Elbrächter, D. Perekrestenko, P. Grohs, and H. Bölcskei, "Deep neural
network approximation theory," *IEEE Transactions on Information
Theory*, submitted Jan. 2019, revised, June 2020.
[Link to Document]

[3]
D. Yarotsky, "Error bounds for approximations with deep ReLU networks,"
*Neural Networks*, vol. 94, pp. 103-114, 2017.
[Link to Document]

[4]
P. L. Bartlett, N. Harvey, C. Liaw, and A. Mehrabian, "Nearly-tight
VC-dimension and pseudodimension bounds for piecewise linear neural
networks," *Journal of Machine Learning Research*, vol. 20, no. 63, pp. 1-17, 2019.
[Link to Document]

### Learning in indefinite spaces (DA)

In classical learning theory, a symmetric, positive semidefinite and continuous kernel function is used to construct a reproducing kernel Hilbert space, which serves as a hypothesis space for learning algorithms [1].

However, in many applications, the kernel function fails to be positive semidefinite [2] and leads to indefinite Krein spaces [3]. The goal of this project is to develop a theory of learning for reproducing kernel Krein spaces.

Type of project: 100% theory

Prerequisites: Strong mathematical background, measure theory, functional analysis

Supervisor: Erwin Riegler

Professor:
Helmut Bölcskei

References:

[1] F. Cucker and D. X. Zhou, "Learning Theory," *ser. Cambridge Monographs on Applied and Computational Mathematics*, Cambridge University Press, 2007.

[2] R. Luss and A. d’Aspremont, "Support vector machine classification with indefinite kernels," *Mathematical Programming Computation*, vol. 1, no. 2-3, pp. 97–118, Oct. 2009.

[3] A. Gheondea, "Reprocucing kernel Krein spaces," *Chapter 14 in D. Alpay, Operator Theory*, Springer, 2015.

### Neural collapse (SA/DA)

Recent experiments show that the last-layer features of prototypical neural networks trained by stochastic gradient descent on common classification datasets favor certain symmetric and geometric patterns. Moreover, the networks develop towards these patterns even when training is continued after the training error has already been driven to zero [1]. For example, the individual last-layer features collapse to their class means.

These patterns, appearing in empirical network training, can be used to explain the generalization ability of deep networks. The phenomenon is called ”neural collapse” and constitutes a type of inductive/implicit bias, which has been studied mathematically for linear networks trained by gradient descent on classification datasets [2,3]. The goal of this project is to develop a general mathematical theory for the ”neural collapse” phenomenon.

Type of project: 80% theory, 20% simulation

Prerequisites: Strong mathematical background and programming skills

Supervisor: Weigutian Ou

Professor:
Helmut Bölcskei

References:

[1]
V. Papyan, X. Y. Han, and D. L. Donoho, “Prevalence of neural collapse during the terminal
phase of deep learning training,” *Proceedings of the National Academy of Sciences*, vol. 117,
no. 40, pp. 24652–24663, 2020.
[Link to Document]

[2]
D. Soudry, E. Hoffer, M. S. Nacson, S. Gunasekar, and N. Srebro, “The implicit bias of
gradient descent on separable data,” *The Journal of Machine Learning Research*, vol. 19,
no. 1, pp. 2822–2878, 2018.
[Link to Document]

[3]
Z. Ji and M. J. Telgarsky, “Gradient descent aligns the layers of deep linear networks,”
*Proceedings of the 7th International Conference on Learning Representations*, ICLR, 2019.
[Link to Document]

### Fundamental limits of deep neural network learning on low-dimensional objects (DA)

Deep neural networks approximate functions by arranging neurons in layers that are connected by weighted edges. Deep learning-based methods have recently shown stellar performance in various practical applications such as speech recognition [1], image classification [2], handwritten digit recognition [3], and game intelligence [4].

The aim of this project is to study the fundamental limits of deep neural network learning of high-dimensional functions that ``live'' on low-dimensional objects such as, e.g., manifolds. In particular, based on the theory developed in [5], you will try to establish a relationship between the complexity (e.g., manifold dimension) of the support set of the functions to be learned and the complexity of the corresponding optimally approximating neural networks in terms of the number of bits needed to encode the network topology and the quantized weights in the network. The theory to be developed will also build on the recently developed Kolmogorov-Donoho approximation theory for deep neural networks [6,7].

Type of project: 100% theory

Prerequisites: Strong mathematical background, measure theory, knowledge on (deep) neural networks

Supervisor: Erwin Riegler

Professor:
Helmut Bölcskei

References:

[1]
G. Hinton, L. Deng, D. Yu, et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," *IEEE Signal Process. Mag.*, vol. 29, no. 6, pp. 82–97, 2012.
[Link to Document]

[2]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in *Advances in Neural Information Processing Systems 25*, Curran Associates, Inc., pp. 1097–1105, 2012.
[Link to Document]

[3]
Y. LeCun, L. D. Jackel, L. Bottou, et al., "Comparison of learning algorithms for handwritten digit recognition," *International Conference on Artificial Neural Networks*, pp. 53–60, 1995.
[Link to Document]

[4]
D. Silver, A. Huang, C. J. Maddison, et al., "Mastering the game of Go with deep neural networks and tree search," *Nature*, vol. 529, no. 7587, pp. 484–489, 2016.
[Link to Document]

[5]
G. Alberti, H. Bölcskei, C. De Lellis, G. Koliander, and E. Riegler, "Lossless analog compression," *IEEE Trans. Inf. Theory*, to appear, 2019.
[Link to Document]

[6]
H. Bölcskei, P. Grohs, G. Kutyniok, and P. Petersen, "Optimal approximation with sparsely connected deep neural networks," *SIAM Journal on Mathematics of Data Science*, vol. 1, no. 1, pp. 8-45, 2019.
[Link to Document]

[7]
P. Grohs, D. Perekrestenko, D. Elbrächter, and H. Bölcskei, "Deep neural network approximation theory," *IEEE Transactions on Information Theory*, Jan. 2019, submitted.
[Link to Document]

### Learning dynamical systems with recurrent neural networks (DA)

Neural networks have led to impressive practical successes in various applications such as speech recognition [1], image classification [2], handwritten digit recognition [3], and game intelligence [4]. More recently, recurrent neural networks (RNNs) have been employed for learning dynamical systems [5, 6]. The goal of this project is to develop a theoretical framework for the characterization of the fundamental limits of deep neural network learning of dynamical systems. Starting from the application scenarios in [5], where an RNN-based architecture is used to model extreme events in complex dynamical systems and [6], where long short-term memory (LSTM) RNNs are used to forecast high-dimensional chaotic systems, you will build on [7, 8, 9, 10, 11, 12] to establish such a theory.

Type of project: 100% theory

Prerequisites: Strong mathematical background, knowledge on neural networks and system theory

Supervisor: Recep Gül

Professor:
Helmut Bölcskei

References:

[1]
G. Hinton, L. Deng, D. Yu, et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," *IEEE Signal Process. Mag.*, vol. 29, no. 6, pp. 82–97, 2012.
[Link to Document]

[2]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in *Advances in Neural Information Processing Systems 25*, Curran Associates, Inc., pp. 1097–1105, 2012.
[Link to Document]

[3]
Y. LeCun, L. D. Jackel, L. Bottou, et al., "Comparison of learning algorithms for handwritten digit recognition," *International Conference on Artificial Neural Networks*, pp. 53–60, 1995.
[Link to Document]

[4]
D. Silver, A. Huang, C. J. Maddison, et al., "Mastering the game of Go with deep neural networks and tree search," *Nature*, vol. 529, no. 7587, pp. 484–489, 2016.
[Link to Document]

[5] Y.W. Zhong, P. Vlachas, P. Koumoutsakos, and T. Sapsis, "Data-assisted reduced-order modeling of extreme events in complex dynamical systems," PLoS ONE, vol. 13, pp. 1-22, May 2018. [Link to Document]

[6] P. Vlachas, W. Byeon, Y.W. Zhong, T. Sapsis, and P. Koumoutsakos, "Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks," Proceedings of the Royal Society A: Mathematical, Physical, and Engineering Sciences, vol. 474, no. 2213, 2018. [Link to Document]

[7] E. D. Sontag, “Neural nets as system models and controllers,” Proceedings Seventh Yale Workshop on Adaptive and Learning Systems, pp. 73–79, May 1992. [Link to Document]

[8] E. D. Sontag, “Neural networks for control,” In H. Trentelman and J. Willems, eds., Essays on Control: Perspectives in the Theory and its Applications, vol. 14 of Progress in Systems and Control Theory, pp. 339-380, Birkhäuser. [Link to Document]

[9]
H. Bölcskei, P. Grohs, G. Kutyniok, and P. Petersen, "Optimal approximation with sparsely connected deep neural networks,"* SIAM Journal on Mathematics of Data Science*, vol. 1, no. 1, pp. 8-45, 2019.
[Link to Document]

[10]
P. Grohs, D. Perekrestenko, D. Elbrächter, and H. Bölcskei, "Deep neural network approximation theory," *IEEE Transactions on Information Theory*, Jan. 2019, submitted.
[Link to Document]

[11]
P. Koiran, E. D. Sontag, "Vapnik-Chervonenkis dimension of recurrent neural networks," *Discrete Applied Mathematics*, vol. 86, no. 1, pp. 63-79, Aug. 1998.
[Link to Document]

[12]
K. Funahasi, Y. Nakamura, "Approximation of dynamical systems by continuous time recurrent neural networks," *Neural Networks*, vol. 6, no. 6, pp. 801-806, Nov. 1992.
[Link to Document]

### Generalization and robustness of loss minima for deep learning (BA/SA/DA)

Over the last decade deep learning has become a staple method of machine learning. A neural network effectively implements a function as an alternating composition of affine maps and nonlinearities. The nonlinearity typically acts componentwise and is fixed, whereas the coefficients of the affine maps, often called the parameters of the neural network, need to be learned by minimizing a cost function, typically through stochastic gradient descent algorithms (SGD) [1].

This approach to machine learning works incredibly well, as evidenced by groundbreaking practical successes in optical character recognition [2], image classification [3], and speech recognition [4]. It is remarkable that even highly overparametrized neural networks exhibit very good generalization properties, i.e., good performance on new samples. In the overparametrized regime, there is an abundance of global minima in the cost function, all of which can potentially be found by SGD, but only some of which yield good generalization performance. The goal of this project is to explore the intrinsic regularization effect of the (training) data structure to be inferred (for instance MNIST) on the minima-selection properties of SGD.

Type of project: 80% programming, 20% theory/literature research

Prerequisites: Familiarity with training deep neural networks

Supervisor: Thomas Allard

Professor:
Helmut Bölcskei

References:

[1]
D. Kingma and J. Ba, "Adam: A method for stochastic optimization," *Proceedings of the 3rd International Conference on Learning Representations*, 2015.
[Link to Document]

[2]
Y. LeCun, L. D. Jackel, L. Bottou, et al., "Comparison of learning algorithms for handwritten digit recognition," *International Conference on Artificial Neural Networks*, pp. 53–60, 1995.
[Link to Document]

[3]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in *Advances in Neural Information Processing Systems 25*, Curran Associates, Inc., pp. 1097–1105, 2012.
[Link to Document]

[4]
G. Hinton, L. Deng, D. Yu, et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," *IEEE Signal Process. Mag.*, vol. 29, no. 6, pp. 82–97, 2012.
[Link to Document]

### Predicting generalization of neural networks (SA)

Over the past decade, neural networks have reached state of the art performance on various problems ranging from object recognition to speech analysis [1]. A neural network can be thought of as a set of neurons structured in layers with parameterized connections. Training a neural network then amounts to finding optimal values for these parameters based on training data. A measure of the quality of the trained network is its generalization performance on previously unseen data.

Although numerous bounds on the generalization error of neural networks are available in the literature [2], a clear understanding of the underlying root causes that determine generalization performance is still lacking. In this project, you will design a function whose input is a trained neural network along with its training data and whose output is a measure that quantifies how well the trained model generalizes. This project is inspired by the competition [3].

Type of project: 25% theory, 75% programming

Prerequisites: Strong programming skills (Python), background in machine learning and deep learning

Supervisor: Thomas Allard

Professor:
Helmut Bölcskei

References:

[1]
I. Goodfellow et al., "Deep learning," *MIT press*, Vol.1, Cambridge, 2016.

[2]
B. Neyshabur et al., "The role of over-parametrization in generalization of neural networks," *International Conference on Learning Representations*, 2018.
[Link to Document]

[3] "Predicting generalization in deep learning," [Online source]. [Link to Document]

### Attentive recurrent neural networks for categorizing and generating news (SA)

In order to make news of interest available on the Internet accessible quickly and effectively, it is crucial to automatically categorize news messages. Modern machine learning methods such as Reccurent Neural Networks (RNN) with Attention [1] are a promising solution to this problem.

The concept of "Attention" in RNN is a mechanism allowing to focus on certain parts of the input sequence when predicting certain parts of the output sequence. In this project, you will first build RNN models that categorize news items based on their headlines and short descriptions. Then, you will build a generative model and train it to generate news for each specific category, e.g., sports, politics, entertainment, etc. The goal of this project is to investigate whether, based on recent work on text genration from attributes [3,4], attention networks can improve upon the approach in [2]. As a training/test dataset you will use the Kaggle dataset [5] that contains almost 125,000 news headlines from the past 5 years obtained from the HuffPost [6].

Type of project: 20% theory, 80% programming

Prerequisites: Strong programming skills (Python), background in machine learning and deep learning

Supervisor: Dmytro Perekrestenko

Professor:
Helmut Bölcskei

References:

[1]
D. Bahdanau et al., “Neural machine translation by jointly learning to align and translate,” *International Conference on Learning Representations (ICLR)*, 2015.
[Link to Document]

[2] O. Fuks, “Classification of news dataset,” [Report], 2018. [Link to Document]

[3]
L. Dong et al., “Learning to generate product reviews from attributes,” *Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics*, 2017.
[Link to Document]

[4] A. Radford, R. Jozefowicz, and I. Sutskever, “Learning to generate reviews and discovering sentiment,” arXiv, 2017. [Link to Document]

[5] R. Misra, “News category dataset,” 2018. [Link to Document]

[6] The HuffPost. [Link to Document]

### Acoustic sensing and trajectory estimation of objects flying at supersonic speed (with industry) (SA)

In shooting sports, hunting, and law-enforcement applications measuring the speed and trajectory of projectile flight at high precision and reliability is an important technical challenge. For supersonic projectiles these quantities are estimated from signals aquired by microphones placed at different locations. Recently, more powerful microprocessors have made it possible to employ more sophisticated algorithms.

The goal of this project is to investigate techniques such as linearization of non-linear systems of equations, least squares fitting, and neural network driven machine learning. Existing hardware and algorithms provide an ideal starting point for the project, which will be carried out in collaboration with an industry partner called SIUS (located in Effretikon, Zurich). SIUS offers close supervision, and the possibility to use real hardware and a test laboratory.

About the industry partner: SIUS is the world’s leading manufacturer of electronic scoring systems in shooting sports. The company is specialized in producing high speed and high precision measurement equipment capable of measuring projectile position and trajectory and has been equipping the most important international competitions including the Olympic Games for decades.

Type of project: 20% literature research, 20% theory, 50% implementation/programming, 10% experiments

Prerequisites: Solid mathematical background, knowledge of SciPy, Matlab or a similar toolset, ideally knowledge on (deep) neural networks

Supervisor: Michael Lerjen, Steven Müllener

Professor:
Helmut Bölcskei

References:

[1]
SIUS Homepage
[Link to Document]

### Double descent curves in machine learning (DA)

Classical machine learning theory suggests that the generalization error follows a U-shaped curve as a function of the model complexity [1, Sec 2.9.]. When too few parameters are used to train the model, the generalization error is high due to underfitting. Too many parameters result in overfitting and hence again in a large generalization error. There exists a sweet spot at the bottom of the so-called U-shaped curve. During the past few years, it was observed that increasing the model complexity beyond the so-called interpolation threshold leads to a generalization error that starts decreasing again [2]. The overall generalization error hence follows a so-called double descent curve. To date, there are only experimental results indicating the double descent behavior. These experiments employ vastly different complexity measures and learning algorithms. The goal of this project is to first understand the experiments reported in the literature. Then, you will study the theory of metric entropy [3] and you will try to understand whether, and if so under which learning algorithms, a double descent curve appears when model complexity is measured in terms of metric entropy.

Type of project: 70% theory, 30% simulation

Prerequisites: Programming skills and knowledge in machine learning

Supervisor: Weigutian Ou

Professor:
Helmut Bölcskei

References:

[1]
J. Friedman, T. Hastie, and R. Tibshirani, "The elements of statistical learning," *Springer Series in Statistics*, vol. 1, Springer, New York, 2001.

[2]
M. Belkin, D. Hsu, S. Ma, and S. Mandal, "Reconciling modern machine-learning practice and the classical bias–variance trade-off," *Proceedings of the National Academy of Sciences*, 116(32):15849–15854, 2019.
[Link to Document]

[3]
P. Grohs, D. Perekrestenko, D. Elbrächter, and H. Bölcskei, "Deep neural network approximation theory," *IEEE Transactions on Information Theory*, Jan. 2019, submitted.
[Link to Document]

### On the metric entropy of dynamical systems (DA)

The aim of this project is to explore the metric complexity of dynamical systems, i.e., to identify how much information about a system's input-output behavior is needed to be able to describe the sytem dynamics to within a prescribed accuracy. In particular, you will study the asymptotics of ε-entropy in the Kolmogorov sense [1, 2] of a certain class of causal linear systems [3]. Based on these results you will try to develop a general theory that encompasses more general classes of dynamical systems, including time-varying systems [4] and nonlinear systems [5].

Type of project: 100% theory

Prerequisites: Strong mathematical background

Supervisor: Diyora Salimova

Professor:
Helmut Bölcskei

References:

[1]
A. N. Kolmogorov, "On certain asymptotic characteristics of completely bounded metric spaces," *Doklady Akademii Nauk SSSR*, vol. 108, no. 3, pp. 385–389, 1956.

[2]
A. N. Kolmogorov and V. M. Tikhomirov, "ε-entropy and ε-capacity of sets in functional spaces," in *Uspekhi Matematicheskikh Nauk*, vol. 14, no. 2, pp. 3–86, 1959.

[3]
G. Zames, "On the metric complexity of causal linear systems: ε-entropy and ε-dimension for continuous time," *IEEE Transactions on Automatic Control*, vol. 24, no. 2, pp. 222–230, 1979.
[Link to Document]

[4]
G. Matz, H. Bölcskei, and F. Hlawatsch, "Time-frequency foundations of communications," *IEEE Signal Processing Magazine*, vol. 30, no. 6, pp. 87–96, 2013.
[Link to Document]

[5]
M. Schetzen, "Nonlinear system modeling based on the Wiener theory," * Proceedings of the IEEE*, vol. 69, no. 12, pp. 1557–1573, 1981.
[Link to Document]