• Course code:63835I
  • Credits:5
  • Semester: winter
  • Contents

Effective theory of deep learning

Deep learning architectures have many hyperparameters. Considering only the paradigmatic multi-layer perceptron (MLP) architecture, we can change the number of layers, their width, activation functions, and initial parameter distributions. In the last two decades, these hyperparameters have been experimentally optimized towards their optimal values. In this course, we will build a rigorous effective theory that quantitatively describes various choices of hyperparameters, namely the height-to-width ratio, initialization distribution, and activation functions. Our theory will use Gaussian perturbation theory to describe the correlations of activations through various layers of the MLP network. Our rigorous results will provide a firm theoretical ground for many well-established deep-learning practices. Considered techniques can be extended to various other architectures, e.g., transformers. Importantly, they can also be used to derive the scaling laws for large models, which is increasingly important due to the huge cost of their training.

  • Study programmes
  • Distribution of hours per semester
15
hours
lectures
15
hours
tutorials
20
hours
tutorials
  • Professor
Instructor
Room:R2.26 - Laboratorij LKM
Course Organiser
Room:R2.17 - Kabinet