ECE 8803: Online Decision Making in Machine Learning (Fall 2021)

Times: Monday and Wednesday, 9:30 – 10:45 am

Location: Scheller College of Business, room 223

Instructor: Vidya K Muthukumar

Office Hours: 12-1 pm, virtual

Prerequisites: undergraduate probability (ECE3077 or equivalent), undergraduate linear algebra (MATH 2551 or equivalent). Mathematical maturity and familiarity with proof-based arguments will be assumed.

Brief description: In many applications of machine learning (ML), data is collected sequentially; moreover, decisions can impact performance both in the present and the future. This class will deal with the design of ML algorithms for real- time decision making, including reinforcement learning. Classical applications in engineering and modern applications in the ML pipeline will both be discussed, but the focus of the course will be foundational — on understanding design principles and the inner workings of algorithms for online decision-making.

Upon successful completion of this course, students will be able to:

Understand and explain the basic design principles of any online algorithm under diverse assumptions on the environment and reward feedback mechanism.
Understand how these principles relate to classical concepts in information theory, signal processing, communications, and control theory.
Assess the efficacy of an online algorithm for an engineering/machine learning application based on its performance guarantees, tractability of implementation, scalability and assumptions made on the environment.
Appreciate how online algorithms relate to other aspects of the machine learning pipeline.

Grading/Format: The course will be graded as follows:

Homeworks (top 5/6): 45%
Midterm (take-home, Oct 21-22): 25%
Course project: 30%

Piazza/Canvas: The primary mode of interactive communication in this course will be Piazza. Please sign up at the course page, and monitor Piazza for announcements regarding lecture, homeworks, midterm and project. As is standard, we will also use Canvas to keep track of assignments and share resources related to the class.

Resources and schedule

Lecture schedule:

Date	Topic	Resources
23 Aug	Logistics and introduction	Slides
25 Aug	Discussion on probability/linear algebra	Review note on probability Review note on linear algebra
30 Aug	Basics of prediction of an adversarial sequence	Lecture note
1 Sep	The multiplicative weights algorithm and “no regret”	Lecture note
8 Sep	No-regret through perturbation	Lecture note (also for 13 Sep)
13 Sep	No-regret through perturbation, continued	See above.
15 Sep	From prediction to decision-making: Online linear optimization	Lecture note
20 Sep	Follow-the-Regularized-Leader, Introduction to online convex optimization	Lecture note
22 Sep	Online convex optimization and stochastic optimization	Lecture note
27 Sep	Overview of adaptive methods in online learning	Lecture note
29 Sep	Introduction to limited-information feedback	Lecture note
4 Oct	Limited-information feedback and UCB, Part 1	Lecture note
6 Oct	Limited-information feedback and UCB, Part 2	Lecture note
13 Oct	UCB and informal discussion of lower bound	Lecture note
18 Oct	Thompson sampling algorithm, Part 1	Lecture note
20 Oct	Thompson sampling algorithm, Part 2	Lecture note
25 Oct	Structured bandits: Linear and Gaussian processes	Lecture note
27 Oct	Contextual bandits
1 Nov	Dynamic programming and optimal control	Lecture note
3 Nov	Tabular RL with a generative model	Lecture note
8 Nov	Model-based exploration in tabular RL	Lecture note
10 Nov	Value iteration and Q-learning	Lecture note
15 Nov (virtual)	Policy-based methods	Lecture note
17 Nov (virtual)	Optional general-audience video-listening
22 Nov	An overview of RL with function approximation	Lecture note
29 Nov	Online learning and zero-sum game theory	Lecture note
1 Dec	Online learning and non-zero-sum game theory	Lecture note
6 Dec	LAST DAY OF CLASS: Poster presentations	N/A

Homework schedule:

Submission due date and self-grade upload deadline are both 11:59 ET. Submission and self-grade upload will be done via Canvas.

	Rough set of topics	Upload date	Due date	Self-grade due date
Homework 1	Fundamentals of adversarial prediction	1 Sep	14 Sep	21 Sep
Homework 2	From prediction to decision-making	15 Sep	28 Sep	5 Oct
Homework 3	Online optimization, introduction to bandits	1 Oct	18 Oct	24 Oct
Homework 4	Bayesian and structured bandits	27 Oct	10 Nov	17 Nov
Homework 5	Reinforcement learning and optimal control	11 Nov	24 Nov	6 Dec