Master Thesis and Semester Projects

If you are interested in one of the following topics, please contact the person listed with the respective topic.

If you don't find a compelling topic in our list, you are most welcome to discuss your interests with us.

Also, we have a list of finished theses on our website.

List of Projects



Approximation theory of deep neural networks (SA/DA)

guelr_simple_network.png
Deep neural networks approximate functions by arranging neurons in layers that are connected by weighted edges. Deep learning-based methods have recently shown stellar performance in various practical applications, such as speech recognition [1], image classification [2], handwritten digit recognition [3], and game intelligence [4]. This success led to a renewed interest in understanding theoretical properties of deep neural networks.

In this project, you will study various theoretical treatises on benefits of depth in neural networks. In [5], [6] it is shown that in the approximation of certain function classes shallow networks need exponentially many more nodes than deep neural networks. Similarly, in [7] it is demonstrated that approximation of a simple radial function by 2-layer neural networks requires exponentially many nodes in the input dimension, whereas the same function can be approximated by a small 3-layer feedforward network. Smooth functions are also known to be approximated more efficiently by deeper neural networks [8]. The purpose of this project is to evaluate these results in the light of a new theory [9], [10] which introduces the concept of Kolmogorov-optimal approximation characterizing connectivity and memory requirements of the approximating network and the complexity of the function class to be approximated.

Type of project: 100% theory
Prerequisites: Strong mathematical background, knowledge on (deep) neural networks
Supervisor: Recep Gül
Professor: Helmut Bölcskei

References:
[1] G. Hinton, L. Deng, D. Yu, et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, 2012. [Link to Document]

[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems 25, Curran Associates, Inc., pp. 1097–1105, 2012. [Link to Document]

[3] Y. LeCun, L. D. Jackel, L. Bottou, et al., "Comparison of learning algorithms for handwritten digit recognition," International Conference on Artificial Neural Networks, pp. 53–60, 1995. [Link to Document]

[4] D. Silver, A. Huang, C. J. Maddison, et al., "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, no. 7587, pp. 484–489, 2016. [Link to Document]

[5] M. Telgarsky, "Benefits of depth in neural networks," arXiv:1602.04485, 2016. [Link to Document]

[6] M. Telgarsky, "Representation benefits of deep feedforward networks," arXiv:1509.08101, 2015. [Link to Document]

[7] R. Eldan, O. Shamir, "The power of depth for feedforward neural networks," arXiv:1512.03965, 2016. [Link to Document]

[8] D. Yarotsky, "Error bounds for approximations with deep ReLU networks," arXiv:1610.01145, 2017. [Link to Document]

[9] H. Bölcskei, P. Grohs, G. Kutyniok, and P. Petersen, "Optimal approximation with sparsely connected deep neural networks," SIAM Journal on Mathematics of Data Science, vol. 1, no. 1, pp. 8-45, 2019. [Link to Document]

[10] P. Grohs, D. Perekrestenko, D. Elbrächter, and H. Bölcskei, "Deep neural network approximation theory," IEEE Transactions on Information Theory, Jan. 2019, submitted. [Link to Document]



Theoretical guarantees for stochastic gradient descent (DA)

vlacicv_NNsimple.png
Over the last decade deep neural networks have become a staple method of machine learning. A neural network effectively implements a function as an alternating composition of affine maps and nonlinearities. The nonlinearity typically acts componentwise and is fixed, whereas the coefficients of the affine maps, often called the parameters of the neural network, need to be learned by solving the optimization problem of minimizing a loss function. This is usually done through stochastic gradient descent algorithms [1].

This approach to machine learning works incredibly well, as evidenced by groundbreaking applications of deep learning in optical character recognition [2], image classification [3], and speech recognition [4]. However, theoretical results describing conditions under which stochastic gradient descent is guaranteed to solve a given learning task are scarce. The goal of this project is to explore the nature of the stochastic gradient descent algorithm as applied to simple learning scenarios, and, if possible, to derive conditions on the architecture and the parameters of the neural network that yield success guarantees.

Type of project: 60% theory/literature research, 40% own derivations and proofs
Prerequisites: Applied analysis background, basic familiarity with neural networks
Supervisor: Verner Vlačić
Professor: Helmut Bölcskei

References:
[1] D. Kingma and J. Ba, "Adam: A method for stochastic optimization," Proceedings of the 3rd International Conference on Learning Representations, 2015. [Link to Document]

[2] Y. LeCun, L. D. Jackel, L. Bottou, et al., "Comparison of learning algorithms for handwritten digit recognition," International Conference on Articial Neural Networks, pp. 53–60, 1995. [Link to Document]

[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems 25, Curran Associates, Inc., pp. 1097–1105, 2012. [Link to Document]

[4] G. Hinton, L. Deng, D. Yu, et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, 2012. [Link to Document]



Estimation of fractal dimensions (DA)

dstotz_fractal.png
Fractal dimensions such as the box-counting dimension or the Rényi information dimension of a random vector often appear as a measure of the vector's description complexity. Recently, fractal dimensions were shown to characterize fundamental limits in information-theoretic problems such as analog data compression [2, 3] and communication over interference channels [4 - 6]. The exact computation of fractal dimensions is, in general, hard, as explicit formulae are available only for certain classes of well-behaved probability distributions.

The goal of this project is to develop, based on the results in [1], a statistical theory and corresponding algorithms for the estimation of fractal dimensions. The resulting estimation procedures shall then be used to gain insights into the fractal dimension of sources with structure relevant to information-theoretic problems.

Type of project: 60% theory, 40% simulation
Prerequisites: Strong mathematical background, measure theory, probability theory
Supervisor: Erwin Riegler
Professor: Helmut Bölcskei

References:
[1] C. D. Cutler, “A review of the theory and estimation of fractal dimension,” in Dimension estimation and models, H. Tong, Ed., Nonlinear Time Series and Chaos, vol. 1, pp. 1–107, World Scientific, 1993.

[2] Y. Wu and S. Verdú, “Rényi information dimension: Fundamental limits of almost lossless analog compression,” IEEE Trans. Inf. Theory, vol. 56, no. 8, pp. 3721–3748, Aug. 2010. [Link to Document]

[3] D. Stotz, E. Riegler, E. Agustsson, and H. Bölcskei, “Almost lossless analog signal separation and probabilistic uncertainty relations,” IEEE Trans. Inf. Theory, vol. 63, no. 9,\ pp. 5445-5460, Sep. 2017. [Link to Document]

[4] Y. Wu, S. Shamai (Shitz), and S. Verdú, “Information dimension and the degrees of freedom of the interference channel,” IEEE Trans. Inf. Theory, vol. 61, no. 1, pp. 256–279, Jan. 2015. [Link to Document]

[5] D. Stotz and H. Bölcskei, “Degrees of freedom in vector interference channels,” IEEE Transactions on Information Theory, vol. 62, no. 7, pp. 4172–4197, Jul. 2016. [Link to Document]

[6] D. Stotz and H. Bölcskei, “Characterizing degrees of freedom through additive combinatorics,” IEEE Transactions on Information Theory, vol. 62, no. 11, pp. 6423–6435, Nov. 2016. [Link to Document]



Learned priors for variational autoencoders (SA/DA)

michaelt_prior.png
Variational autoencoders (VAEs) [1] are a popular method for variational inference thanks to their elegant formulation allowing for straightforward implementation and stable training. However, scaling VAEs to high-resolution and complex image data sets was found to be challenging. A promising direction towards learning more powerful VAEs is to use flexible, learned priors, as explored in [2]. This project aims to benchmark existing methods for learning such priors, explore variations thereof, and propose new methods. Image credit [2].

Type of project: 20%-40% theory, 60%-80% programming
Prerequisites: Programming, linear algebra, experience with deep learning software is a plus
Supervisor: Michael Tschannen
Professor: Helmut Bölcskei

References:
[1] D. Kingma and M. Welling, "Auto-encoding variational Bayes," International Conference on Learning Representations, 2014. [Link to Document]

[2] J. Tomczak and M. Welling, "VAE with a VampPrior," International Conference on Artificial Intelligence and Statistics, 2018. [Link to Document]



Distribution-preserving lossy compression (SA/DA)

michaelt_dplc.png
Recent advances in extreme image compression [1] allow artifact-free image reconstruction even at very low bitrates. Motivated by these results, [2] formalizes the concept of distribution-preserving lossy compression (DPLC), which optimizes the compression rate-distortion tradeoff under the constraint of the reconstructed (decompressed) samples following the empirical distribution of the training data. Specifically, a DPLC system (almost) perfectly reconstructs the training data when enough bits are allocated to the compressed representation. When zero bits are assigned to the compressed representation it learns a (deep) generative model of the data, and for intermediate bitrates DPLC smoothly interpolates between matching the distribution of the training data and perfectly reconstructing the training samples (cf. the figure on the left; the numbers at the top correspond to different rates (in bits per pixel) and each row corresponds to a different decoder realization). The DPLC framework introduced in [2] was so far applied to images only. This project shall explore new applications such as speech compression.

Type of project: 20%-40% theory, 60%-80% programming, depending on the student's preference
Prerequisites: Programming, linear algebra, experience with deep learning software is a plus
Supervisor: Michael Tschannen
Professor: Helmut Bölcskei

References:
[1] E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. Van Gool, "Generative adversarial networks for extreme learned image compression," arXiv:1804.02958, 2018. [Link to Document]

[2] M. Tschannen, E. Agustsson, and M. Lučić, "Deep generative models for distribution-preserving lossy compression," arXiv:1805.11057, 2018. [Link to Document]



Phase transitions for matrix separation (DA)

eriegler_phase.png
A phase transition is a sharp division of the parameter space of a data processing problem into a region of success and a region of failure. This phenomenon was studied for the matrix separation problem [1, 2] and for the matrix completion problem [3] under specific recovery algorithms. Information-theoretic phase transitions do not depend on specific recovery algorithms and reveal what is possible in principle. Recently, information-theoretic phase transitions were obtained for the matrix completion problem [4] and for the signal separation problem [5].

The first goal of this project is to build on the techniques developed in [4, 5] to characterize information-theoretic phase transitions for the matrix separation problem, which in its traditional incarnation separates a low-rank matrix from a sparse matrix. The second goal is to investigate pairs of matrix structures (beyond low-rank and sparse) that allow for separation and to characterize their information-theoretic phase transitions.

Type of project: 80% theory, 20% simulation
Prerequisites: Strong mathematical background, measure theory, probability theory
Supervisor: Erwin Riegler
Professor: Helmut Bölcskei

References:
[1] D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp, “Living on the edge: A geometric theory of phase transitions in convex optimization,” Information and Inference, vol. 3, no. 3, pp. 224–294, Jun. 2014. [Link to Document]

[2] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky, “Rank-sparsity incoherence for matrix decomposition,” SIAM Journal on Optimization, vol. 21, no. 2, pp. 572–596, Jun. 2011. [Link to Document]

[3] E. J. Candés and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9, no. 6, pp. 717–772, Dec. 2009. [Link to Document]

[4] E. Riegler, D. Stotz, and H. Bölcskei, “Information-theoretic limits of matrix completion,” Proc. IEEE Int. Symp. on Inf. Theory (ISIT), pp. 106–110, Jun. 2015. [Link to Document]

[5] D. Stotz, E. Riegler, E. Agustsson, and H. Bölcskei, “Almost lossless analog signal separation and probabilistic uncertainty relations,” IEEE Trans. Inf. Theory, vol. 63, no. 9, pp. 5445-5460, Sep. 2017. [Link to Document]



Analyzing cost functions for generative adversarial networks (SA/DA)

michaelt_gans_pic.jpg
Generative adversarial networks (GANs) [1] have led to remarkable results in machine learning, in particular in image generation tasks [2]. The GAN-learning problem is a two-player game between the so-called generator, who learns how to generate samples resembling the training data, and the discriminator, who learns how to discriminate between real and fake data points. Both players aim at minimizing their own cost function until a Nash-equilibrium is achieved.

The goal of this project is to analyze–-mathematically and possibly experimentally–-different cost functions in the context of GAN-learning for image generation tasks.

Type of project: 80%-90% theory, 10-20% simulation
Prerequisites: Analysis, linear algebra, probability theory, Python
Supervisor: Michael Tschannen
Professor: Helmut Bölcskei

References:
[1] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Proc. of Neural Information Processing Systems (NIPS), pp. 2672–2680, 2014. [Link to Document]

[2] A. Radford, L. Metz, and S. Chintala “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv:1511.06434, 2015. [Link to Document]



Control policy learning for a flight simulator (SA)

mlerjen_simulator.jpg
Reinforcement learning (RL) [1] refers to the problem of finding suitable actions to take in a given situation in order to maximize a reward. RL algorithms that recently led to breakthrough results in practical machine learning tasks (such as learning to play Atari games [2] or the strategy board game Go [3]) typically employ a deep convolutional neural network [4] for feature extraction.

This project aims at developing an RL-based approach for learning control policies for a flight simulator. The scope of the project may encompass value function design, training data acquisition, as well as a first implementation in Python.

Type of project: 90% theory/literature research, 10% programming
Prerequisites: Python, machine learning basics is a plus
Supervisor: Michael Tschannen, Michael Lerjen
Professor: Helmut Bölcskei

References:
[1] R. S. Sutton and B. A. Barto, “Reinforcement learning: An introduction,” MIT press, 1998.

[2] V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518 (7540), pp. 529–533, 2015. [Link to Document]

[3] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529 (7587), pp. 484–489, 2016. [Link to Document]

[4] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521 (7553), pp. 436–444, 2015. [Link to Document]



Grounded language learning of visual-lexical color descriptions (SA)

pdmytro_example_color.png
Grounded language learning (GLL) is a technique for language acquisition that uses a multimodal set of inputs rather than just sets of words or symbols, e.g. it uses a combination of words and related sounds or visuals [1]. Due to the similarity of GLL with the way humans are exposed to language, studying GLL can potentially yield insights on how language is comprehended by humans.

In this project you will explore GLL by training a generative neural network on combinations of linguistic and visual information. Specifically, you will build a generative model based on the recently introduced Deep Recurrent Attentive Writer (DRAW) neural network architecture [2]. You will then train this network to predict lexical color descriptions from visual color representations and vice versa. Both descriminative [3] and generative [4, 6] approaches were used in the literature to deal with the problem of color description. The goal of this project is to investigate whether DRAW networks can improve on the autoencoder approach in [4]. As a training/testing dataset you will use a subset from an online survey [5, 6].

Type of project: 20% theory, 80% programming
Prerequisites: Strong programming skills (Python), background in machine learning and deep learning
Supervisor: Dmytro Perekrestenko
Professor: Helmut Bölcskei

References:
[1] A. Lazaridou et al., “Multimodal word meaning induction from minimal exposure to natural text,” Cognitive Science, 2016. [Link to Document]

[2] K. Gregor, I. Danihelka, A. Graves, D. Rezende, and D. Wierstra, “{DRAW}: A recurrent neural network for image generation,” Proceedings of the 32nd International Conference on Machine Learning, 2015. [Link to Document]

[3] W. Monroe, N. D. Goodman, and C. Potts, “Learning to generate compositional color descriptions,” arXiv, 2016. [Link to Document]

[4] D. Bhargava, G. Vega, and B. Sheffer, “Grounded learning of color semantics with autoencoders,” [Online source], 2017. [Link to Document]

[5] R. Munroe, “Color survey results,” [Blog], 2010. [Link to Document]

[6] B. McMahan and M. Stone, “A bayesian model of grounded color semantics,” Transactions of the Association for Computational Linguistics, 2015. [Link to Document]



Early prediction of sepsis from clinical data (SA/DA)

michaelt_cinc.png
The aim of this project is to develop and evaluate an algorithm for early prediction of sepsis from clinical data, and to submit the algorithm to the PhysioNet/Computing in Cardiology Challenge 2019 [1]. Sepsis is a life-threatening condition that occurs when the body's response to infection causes tissue damage, organ failure, or death. In the U.S., nearly 1.7 million people develop sepsis and 270,000 people die from sepsis each year, costing U.S. hospitals more than any other health condition at $24 billion/year [1]. Please see the challenge website [1] for more information.

This project offers the opportunity to gain experience in learning from noisy, variable-length clinical time series data with missing entries. Image credit [1].

Type of project: 10%-30% theory, 70%-90% programming
Prerequisites: Programming, linear algebra, experience with deep learning software is a plus
Supervisor: Michael Tschannen
Professor: Helmut Bölcskei

References:
[1] Early prediction of sepsis from clinical data: The PhysioNet/Computing in Cardiology Challenge 2019 [Link to Document]



Telemetry for a sounding rocket (SA)

mlerjen_aris.jpg
The ``Akademische Raumfahrt Initiative Schweiz (ARIS)'' aims at engaging students from Swiss educational institutions in hands-on space engineering challenges, specifically in the design, construction, and testing of a sounding rocket to compete in the 2019 Spaceport America Cup in New Mexico, USA.

The Spaceport America Cup is the world’s largest university rocket engineering competition. Student teams are challenged to design and build a rocket capable of launching a 4 kg payload to target altitudes of up to 30,000 ft and to recover the rocket completely post-flight.

In this multi-disciplinary project, you will contribute to the development of the telemetry system of the ARIS rocket. The goal is to implement, test, and verify a proof-of-concept solution for intra-rocket and rocket-to-ground communication. In a first step different options for antennas, communication modules, and protocols shall be analyzed and compared with telemetry systems of commercial rocket manufacturers. In particular, performance, implementation complexity, and cost, all in the context of the structure of the rocket and the specific environment (e.g. high velocity and acceleration), shall be taken into account. In a second step the favored solution shall be implemented, tested, and verified on an ARIS model rocket.

Type of project: 30% concept development, 70% avionics development, testing, and measurements
Prerequisites: Interest in embedded and wireless systems
Supervisor: Michael Lerjen
Professor: Helmut Bölcskei

References:
[1] Homepage of the ARIS project [Link to Document]



Implementation of an RF channel emulator (SA)

mlerjen_ChannelEmulator.jpg
In the development process of radio transceivers, it is important to test and optimize the performance of the implemented hardware, coding schemes, and signal processing algorithms under different channel conditions. This can be done, in principle, in over-the-air tests. However, such an approach exhibits several disadvantages, namely interference from other systems, various logistical problems, and legal restrictions in certain frequency bands. Furthermore, the results will depend on environmental conditions and are therefore often not reproducible.

A channel emulator, i.e., a piece of hardware that emulates a channel and applies it to a communication signal, is a tool built exactly to solve this problem. The key parameters of the channel (e.g., Doppler spread, delay spread, noise, ...) can be controlled through the configuration of the emulator [1]. Channel emulators often also have multiple inputs/outputs for the emulation of MIMO channels. MIMO channel emulation devices are available on the market [2], but are expensive and sometimes not flexible enough.

The aim of this project is to build a MIMO channel emulator that works in the digital baseband domain and can be controlled from an external PC. An FPGA platform including an RF transceiver and analog/digital (ADCs) and digital/analog converters (DACs) was already selected in a previous project and can be used as a starting point.

Type of project: 70% VHDL programming and simulation, 20% measurements, 10% theory
Prerequisites: Interest in RF hardware and wireless systems, Matlab, VHDL
Supervisor: Michael Lerjen
Professor: Helmut Bölcskei

References:
[1] S. Yang and Z. Can, “Design and implementation of multiple antenna channel emulator for LTE system,” Proc. 9th International Conference on Communications and Networking in China (CHINACOM), pp. 208–213, 2014 [Link to Document]

[2] Keysight Technologies, Inc., “Wireless device test sets & wireless solutions” [Link to Document]