Times: Monday and Wednesday, 9:30 – 10:45 am
Location: Scheller College of Business, room 223
Instructor: Vidya K Muthukumar
Office Hours: 12-1 pm, virtual
Prerequisites: undergraduate probability (ECE3077 or equivalent), undergraduate linear algebra (MATH 2551 or equivalent). Mathematical maturity and familiarity with proof-based arguments will be assumed.
Brief description: In many applications of machine learning (ML), data is collected sequentially; moreover, decisions can impact performance both in the present and the future. This class will deal with the design of ML algorithms for real- time decision making, including reinforcement learning. Classical applications in engineering and modern applications in the ML pipeline will both be discussed, but the focus of the course will be foundational — on understanding design principles and the inner workings of algorithms for online decision-making.
Upon successful completion of this course, students will be able to:
- Understand and explain the basic design principles of any online algorithm under diverse assumptions on the environment and reward feedback mechanism.
- Understand how these principles relate to classical concepts in information theory, signal processing, communications, and control theory.
- Assess the efficacy of an online algorithm for an engineering/machine learning application based on its performance guarantees, tractability of implementation, scalability and assumptions made on the environment.
- Appreciate how online algorithms relate to other aspects of the machine learning pipeline.
Grading/Format: The course will be graded as follows:
- Homeworks (top 5/6): 45%
- Midterm (take-home, Oct 21-22): 25%
- Course project: 30%
Piazza/Canvas: The primary mode of interactive communication in this course will be Piazza. Please sign up at the course page, and monitor Piazza for announcements regarding lecture, homeworks, midterm and project. As is standard, we will also use Canvas to keep track of assignments and share resources related to the class.
Resources and schedule
|23 Aug||Logistics and introduction||Slides|
|25 Aug||Discussion on probability/linear algebra|
|30 Aug||Basics of prediction of an adversarial sequence||Lecture note|
|1 Sep||The multiplicative weights algorithm and “no regret”||Lecture note|
|8 Sep||No-regret through perturbation||Lecture note (also for 13 Sep)|
|13 Sep||No-regret through perturbation, continued||See above.|
|15 Sep||From prediction to decision-making: Online linear optimization||Lecture note|
|20 Sep||Follow-the-Regularized-Leader, Introduction to online convex optimization||Lecture note|
|22 Sep||Online convex optimization and stochastic optimization||Lecture note|
|27 Sep||Overview of adaptive methods in online learning||Lecture note|
|29 Sep||Introduction to limited-information feedback||Lecture note|
|4 Oct||Limited-information feedback and UCB, Part 1||Lecture note|
|6 Oct||Limited-information feedback and UCB, Part 2||Lecture note|
|13 Oct||UCB and informal discussion of lower bound||Lecture note|
|18 Oct||Thompson sampling algorithm, Part 1||Lecture note|
|20 Oct||Thompson sampling algorithm, Part 2||Lecture note|
|25 Oct||Structured bandits: Linear and Gaussian processes||Lecture note|
|27 Oct||Contextual bandits|
|1 Nov||Dynamic programming and optimal control||Lecture note|
|3 Nov||Tabular RL with a generative model||Lecture note|
|8 Nov||Model-based exploration in tabular RL||Lecture note|
|10 Nov||Value iteration and Q-learning||Lecture note|
|15 Nov (virtual)||Policy-based methods||Lecture note|
|17 Nov (virtual)||Optional general-audience video-listening|
|22 Nov||An overview of RL with function approximation||Lecture note|
|29 Nov||Online learning and zero-sum game theory||Lecture note|
|1 Dec||Online learning and non-zero-sum game theory||Lecture note|
|6 Dec||LAST DAY OF CLASS: Poster presentations||N/A|
Submission due date and self-grade upload deadline are both 11:59 ET. Submission and self-grade upload will be done via Canvas.
|Rough set of topics||Upload date||Due date||Self-grade due date|
|Homework 1||Fundamentals of adversarial prediction||1 Sep||14 Sep||21 Sep|
|Homework 2||From prediction to decision-making||15 Sep||28 Sep||5 Oct|
|Homework 3||Online optimization, introduction to bandits||1 Oct||18 Oct||24 Oct|
|Homework 4||Bayesian and structured bandits||27 Oct||10 Nov||17 Nov|
|Homework 5||Reinforcement learning and optimal control||11 Nov||24 Nov||6 Dec|